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KNOWLEDGE REPRESENTATION 
IN FUZZY LOGIC 

Lotfi A. Zadeh 

Computer Science Division, Department of EECS 
University of California, Berkeley, California 94720 


ABSTRACT 

The conventional approaches to knowledge representation, e.g., semantic 
networks, frames, predicate calculus and Prolog, are based on bivalent logic. A seri- 
ous shortcoming of such approaches is their inability to come to grips with the issue 
of uncertainty and imprecision. As a consequence, the conventional approaches do 
not provide an adequate model for modes of reasoning which are approximate rather 
than exact Most modes of human reasoning and all of commonsense reasoning fall 
into this category. 

Fuzzy logic, which may be viewed as an extension of classical logical sys- 
tems, provides an effective conceptual framework for dealing with the problem of 
knowledge representation in an environment of uncertainty and imprecision. Mean- 
ing representation in fuzzy logic is based on test-score semantics. In this semantics, 
a proposition is interpreted as a system of elastic constraints, and reasoning is 
viewed as elastic constraint propagation. Our paper presents a summary of the basic 
concepts and techniques underlying the application of fuzzy logic to knowledge 
representation and describes a number of examples relating to its use as a computa- 
tional system for dealing with uncertainty and imprecision in the context of 
knowledge, meaning and inference. 

INTRODUCTION 

Knowledge representation is one of the most basic and actively researched 
areas of AI (Brachman, 1985,1988; Levesque, 1986, 1987; Moore, 1982, 1984; 
Negoita, 1985; Shapiro, 1987; Small, 1988). And yet, there are many important is- 
sues underlying knowledge representation which have not been adequately ad- 
dressed. One such issue is that of the representation of knowledge which is lexically 
imprecise and/or uncertain. 

As a case in point, the conventional knowledge representation techniques 
do not provide effective tools for representing the meaning of or inferring from the 
kind of everyday type facts exemplified by 

(a) Usually it takes about an hour to drive from Berkeley to Stanford in light 
traffic. 

(b) Unemployment is not likely to undergo a sharp decline during the next few 
months. 

(c) Most experts believe that the likelihood of a severe earthquake in the near fu- 
ture is very low . 
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The italicized words in these assertions are the labels of fuzzy predicates, 
fuzzy quantifiers and fuzzy probabilities. The conventional approaches to 
knowledge representation lack the means for representing the meaning of fuzzy con- 
cepts. As a consequence, the approaches based on first order logic and classical pro- 
bability theory do not provide an appropriate conceptual framework for dealing with 
the representation of commonsense knowledge, since such knowledge is by its na- 
ture both lexically imprecise and noncategorical (Moore, 1982, 1984; Zadeh, 1984). 

The development of fuzzy logic was motivated in large measure by the 
need for a conceptual framework which can address the issues of uncertainty and 
lexical imprecision. The principal objective of this paper is to present a summary of 
some of the basic ideas underlying fuzzy logic and to describe their application to 
the problem of knowledge representation in an environment of uncertainty and im- 
precision. A more detailed discussion of these ideas may be found in Zadeh (1978a, 
1978b, 1986, 1988a) and other entries in the bibliography. 

ESSENTIALS OF FUZZY LOGIC 

Fuzzy logic, as its name suggests, is the logic underlying modes of reason- 
ing which are approximate rather than exact. The importance of fuzzy logic derives 
from the fact that most modes of human reasoning — and especially commonsense 
reasoning — are approximate in nature. It is of interest to note that, despite its per- 
vasiveness, approximate reasoning falls outside the purview of classical logic largely 
because it is a deeply entrenched tradition in logic to be concerned with those and 
only those modes of reasoning which lend themselves to precise formulation and 
analysis. 

Some of the essential characteristics of fuzzy logic relate to the following. 

In fuzzy logic, exact reasoning is viewed as a limiting case of approximate 
reasoning . 

In fuzzy logic, everything is a matter of degree. 

Any logical system can be fuzzified. 

In fuzzy logic, knowledge is interpreted a collection of elastic or, 
equivalently, fuzzy constraint on a collection of variables. 

Inference is viewed as a process of propagation of elastic constraints. 

Fuzzy logic differs from the traditional logical systems both in spirit and in 
detail. Some of the principal differences are summarized in the following (Zadeh, 
1983b). 

Truth . In bivalent logical systems, truth can have only two values: true or 
false. In multivalued systems, the truth value of a proposition may be an element of 
(a) a finite set; (b) an interval such as [0,1]; or (c) a boolean algebra. In fuzzy logic, 
the truth value of a proposition may be a fuzzy subset of any partially ordered set but 
usually it is assumed to be a fuzzy subset of the interval [0,1] or, more simply, a 
point in this interval. The so-called linguistic truth values expressed as true, very 
true, not quite true , etc. are interpreted as labels of fuzzy subsets of the unit interval. 

Predicates: In bivalent systems, the predicates are crisp, e.g., mortal, even, 
larger than . In fuzzy logic, the predicates are fuzzy, e.g., tall, ill, soon, swift, much 
larger than. It should be noted that most of the predicates in a natural language are 
fuzzy rather than crisp. 

Predicate Modifiers: In classical systems, the only widely used predicate 
modifier is the negation, not. In fuzzy logic, there is a variety of predicate modifiers 
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which act as hedges, e.g., very , more or less, quite, rather, extremely. Such predi- 
cate modifiers play an essential role in the generation of the values of a linguistic 
variable, e.g., very young, not very young, more or less young, etc., (Zadeh, 1973). 

Quantifiers: In classical logical systems there are just two quantifiers: 
universal and existential. Fuzzy logic admits, in addition, a wide variety of fuzzy 
quantifiers exemplified by few, several, usually, most, almost always, frequently, 
about five , etc. In fuzzy logic, a fuzzy quantifier is interpreted as a fuzzy number or 
a fuzzy proportion (Zadeh, 1983a). 

Probabilities: In classical logical systems, probability is numerical or 
interval-valued. In fuzzy logic, one has the additional option of employing linguistic 
or, more generally, fuzzy probabilities exemplified by likely, unlikely, very likely, 
around 0.8, high , etc. (Zadeh 1986). Such probabilities may be interpreted as fuzzy 
numbers which may be manipulated through the use of fuzzy arithmetic (Kaufmann 
and Gupta, 1985). 

In addition to fuzzy probabilities, fuzzy logic makes it possible to deal with 
fuzzy events. An example of a fuzzy event is: tomorrow will be a warm day, where 
warm is a fuzzy predicate. The probability of a fuzzy event may be a crisp or fuzzy 
number (Zadeh, 1968). 

It is important to note that from the fiequentist point of view there is an in- 
terchangeability between fuzzy probabilities and fuzzy quantifiers or, more general- 
ly, fuzzy measures. In this perspective, any proposition which contains labels of fuz- 
zy probabilities may be expressed in an equivalent from which contains fuzzy 
quantifiers rather than fuzzy probabilities. 

Possibilities: In contrast to classical modal logic, the concept of possibility 
in fuzzy logic is graded rather than bivalent. Furthermore, as in the case of probabil- 
ities, possibilities may be treated as linguistic variables with values such as possible, 
quite possible, almost impossible, etc. Such values may be interpreted as labels of 
fuzzy subsets of the real line. 

A concept which plays a central role in fuzzy logic is that of a possibility 
distribution (Zadeh, 1978a; Dubois and Prade, 1988; Klir, 1988). Briefly, if X is a 
variable taking values in a universe of discourse U , then the possibility distribution 
of X, n x , is the fuzzy set of all possible values of X. More specifically, let n x (u ) 
denote the possibility that X can take the value u, u zU. Then the membership 
function of X is numerically equal to the possibility distribution function n x (u): U- 
>[0, 1], which associates with each element u z U the possibility that X may take u 
as its value. More about possibilities and possibility distributions will be said at a 
later point in this paper. 

It is important to observe that in every instance fuzzy logic adds to the op- 
tions which are available in classical logical systems. In this sense, fuzzy logic may 
be viewed as an extension of such systems rather than as system of reasoning which 
is in conflict with the classical systems. 

Before taking up the issue of knowledge representation in fuzzy logic, it 
will be helpful to take a brief look at some of the principal modes of reasoning in 
fuzzy logic. These are the following, with the understanding that the modes in ques- 
tion are not necessarily disjoint. 

1. Categorical Reasoning 

In this mode of reasoning, the premises contain no fuzzy quantifiers and no fuzzy 
probabilities. A simple example of categorical reasoning is: 
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Carol is slim 

Carol is very intelligent 

Carol is slim and very intelligent 

In the premises, slim and very intelligent are assumed to be fuzzy predicates. The 
fuzzy predicate in the conclusion, slim and very intelligent , is the conjunction of slim 
and intelligent. 

Another example of categorical reasoning is: 

Mary is young 

John is much older than Mary 
John is (much older young). 

where ( muchjolder young) represents the composition of the binary fuzzy predicate 
muchjolder with the unary fuzzy predicate young. More specifically, let n^h older 
and Kyoung denote the possibility distribution functions associated with the fuzzy 
predicates muchjolder and young , respectively. Then, the possibility distribution 
function of John’s age may be expressed as (Zadeh, 1978a) 

e(John ) (^ ) = Y v ( t^much_older ( W , V ) A Ityoung i^lU ) 

where v and a stand for max and min, respectively. 

2. Syllogistic Reasoning 

In contrast to categorical reasoning, syllogistic reasoning relates to inference from 
premises containing fuzzy quantifiers (Zadeh, 1985; Dubois and Prade, 1978a). A 
simple example of syllogistic reasoning is die following 

most Swedes are blond 
most blond Swedes are tall 

most 2 Swedes are blond and tall 

where the fuzzy quantifier most is interpreted as a fuzzy proportion and most 2 is the 
square of most in fuzzy arithmetic (Kaufrnann and Gupta, 1985). 

3. Dispositional Reasoning 

In dispositional reasoning the premises are dispositions, that is, propositions which 
are preponderantly but necessarily always true (Zadeh, 1987). An example of dispo- 
sitional reasoning is: 

heavy smoking is a leading cause of cancer 
to avoid lung cancer avoid heavy smoking 

Note that in this example the conclusion is a maxim which may be interpreted as a 
dispositional command. Another example of dispositional reasoning is: 

usually the probability of failure is not very low 
usually the probability of failure is not very high 

(2 usually © 1) the probability of failure is not very low and not very high 
In this example, usually is a fuzzy quantifier which is interpreted as a fuzzy propor- 
tion and 2 usually © 1 is a fuzzy arithmetic expression whose value may be comput- 
ed through the use of fuzzy arithmetic. (© denotes the operation of subtraction in 
frizzy arithmetic.) It should be noted that the concept of usuality plays a key role in 
dispositional reasoning (Zadeh, 1985, 1987), and is the concept that links together 




the dispositional and syllogistic modes of reasoning. Furthermore, it underlies the 
theories of nonmonotonic and default reasoning (McCarthy, 1980; McDermott, 
1980, 1982; Reiter, 1983). 

4. Qualitative Reasoning 

In fuzzy logic, the term qualitative reasoning refers to a mode of reasoning in which 
the input-output relation of a system is expressed as a collection of fuzzy if-then 
rules in which the antecedents and consequents involve linguistic variables (Zadeh, 
1975, 1989). In this sense, qualitative reasoning in fuzzy logic bears some similarity 
to — but is not coextensive with — qualitative reasoning in AI (de Kleer, 1984; 
Foibus, 1989; Kuipers, 1986). 

A very simple example of qualitative reasoning is: 
volume is small if pressure is high 
volume is large if pressure is low 


volume is (wl A high + w2 A large) if pressure is medium 

where + should be interpreted as infix max; and 

wl = sup (high A medium) 
and 

w2 = sup (low A medium) 

are weighting coefficients which represent, respectively, the degrees to which the an- 
tecedents high and low match the input medium. In wl, the conjunction high A 
medium represents the intersection of the possibility distributions of high and low , 
and the suprenum is taken over the domain of high and medium. The same applies to 
w2. 


Qualitative reasoning underlies many of the applications of fuzzy logic in 
the realms of control and systems analysis (Sugeno, 1985; Pospelov, 1987; Togai, 
1986). In this connection, it should be noted that fuzzy Prolog provides an effective 
knowledge representation language for qualitative reasoning (Baldwin, 1984, 1987; 
Mukaidono, 1987; Zadeh, 1989). 


MEANING AND KNOWLEDGE REPRESENTATION 

In a general setting, knowledge may be viewed as a collection of proposi- 
tions, e.g., 

Mary is young 

Pat is much taller than Mary 

overeating causes obesity 

most Swedes are blond 

tomatoes are red unless they are unripe 

usually high quality goes with high price 

if pressure is high then volume is low 

To constitute knowledge a proposition must be understood. In this sense, 
meaning and knowledge are closely interrelated. In fuzzy logic, meaning 
representation — and thus knowledge representation— is based on test-score seman- 
tics (Zadeh, 1978a, 1986). 

A basic idea underlying test-score semantics is that a proposition in a natur- 
al language may be viewed as a collection of elastic, or, equivalently, fuzzy con- 
straints. For example, the proposition Mary is tall represents an elastic constraint on 
the height of Mary. Similarly, the proposition Jean is blonde represents an elastic 
constraint on the color of Jean’s hair. And, the proposition most tall men are not 
very agile represents an elastic constraint on the proportion of men who are not very 
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agile among tall men. 

In more concrete terms, representing the meaning of a proposition, p, 
through the use of test-score semantics involves the following steps. 

1. Identification of the variables X lt ... ,X n whose values are constrained by the 
proposition. Usually, these variables are implicit rather than explicit in p. 

2. Identification of the constraints C , C m which are induced by p. 

3. Characterization of each constraint, C { , by describing a testing procedure 
which associates with C, a test score x,- representing the degree to which C, is 
satisfied. Usually x, is expressed as a number in the interval [0,1]. More gen- 
erally, however, a test score may be a probability/possibility distribution over 
the unit interval. 

4. Aggregation ofjhe partial test scores ... ,z m into a smaller number of test 
scores . ._. ,x* , which are represented as an overall vector test score 
x = (Xi, . . . , x*) . In most cases k = i, so that the overall test scores is a scalar. 
We shall assume that this is the case unless an explicit statement to the con- 
trary is made. 

It is important to note that, in test-score semantics, the meaning of p is 
represented not by the overall test score x but by the procedure which leads to it. 
Viewed in this perspective, test-score semantics may be regarded as a generalization 
of truth-conditional, possible-world and model-theoretic semantics. However, by 
providing a computational framework for dealing with uncertainty and 
dispositionality — which the conventional semantical systems disregard — test-score 
semantics achieves a much higher level of expressive power and thus provides a 
basis for representing the meaning of a much wider variety of propositions in a na- 
tural language. 

In test-score semantics, the testing of the constraints induced by p is per- 
formed on a collection of fuzzy relations which constitute an explanatory database , 
or ED for short A basic assumption which is made about the explanatory database is 
that it is comprised of relations whose meaning is known to the addressee of the 
meaning-representation process. In an indirect way, then, the testing and aggrega- 
tion procedures in test-score semantics may be viewed as a description of a process 
by which the meaning of p is composed from the meanings of the constituent rela- 
tions in the explanatory database. It is this explanatory role of the relations in ED that 
motivates its description as an explanatory database. 

As will be seen in the sequel, in describing the testing procedures we need 
not concern ourselves with the actual entries in the constituent relations. Thus, in 
general, the description of a test involves only the frames of the constituent relations, 
that is, their names, their variables (or attributes) and the domain of each variable. 

As a simple illustration of the concept of a test procedure, consider die pro- 
position p = Maria is young and attractive. The ED in this case will be assumed to 
consist of the following relations: 

ED £ POPULATION [Name ; Age ; ^Attractive ] + YOUNG [Age; p] , (3.1) 

in which + should be read as "and," and £ stands for “denotes.” 

The relation labeled population consists of a collection of triples whose 
first element is the name of an individual; whose second element is the age of that in- 
dividual; and whose third element is the degree to which the individual in question is 
attractive. The relation young is a collection of pairs whose first element is a value 
of the variable Age and whose second element is the degree to which that value of 
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Age satisfies the elastic constraint characterized by the fuzzy predicate young . In 
effect, this relation serves to calibrate the meaning of the fuzzy predicate young in a 
particular context by representing its denotation as a fuzzy subset, young, of the in- 
terval [0,100]. 

With this ED, the test procedure which computes the overall test score may 
be described as follows: 

1. Determine the age of Maria by reading the value of Age in POPULATION, with 
the variable Name bound to Maria. In symbols, this may be expressed as 

Age (Maria) = Age POPULATION [Name = Maria] . 

In this expression, we use the notation Y P [X = a ] to signify that X is bound to 
a inR and the resulting relation is projected on 7, yielding the values of Y in 
the tuples in which X = a. 

2. Test the elastic constraint induced by the fuzzy predicate young : 

Xi = M YOUNG [Age = Age (Maria )] . 

3. Determine the degree to which Maria is attractive: 

x 2 = ^Attractive POPULATION [Name = Maria] . 

4. Compute the overall test score by aggregating the partial test scores x x and x^ 
For this purpose, we shall use the min operator A as the aggregation operator, 
yielding 

X = X! A X x , (3.2) 

which signifies that the overall test score is taken to be the smaller of the 
operands of a . The overall test score, as expressed by (3.2), represents the 
compatibility of p £ Maria is young and attractive with the data resident in 
the explanatory database. 

In testing the constituent relations in ED, it is helpful to have a collection of 
standardized translation rules for computing the test score of a combination of elastic 
constraints C h . . . , C k from the knowledge of the test scores of each constraint con- 
sidered in isolation. For the most part, such rules are default rules in the sense that 
they are intended to be used in the absence of alternative rules supplied by the user. 

For purposes of knowledge representation, the principal rules of this type 
are the following. 

1 . Rules pertaining to modification 

If die test score for an elastic constraint C in a specified context is x, then in 
the same context the test score for 

(a) not C is 1-x (negation) 

(b) very C isx 2 (concentration) 

(c) more or less C isx 2 (diffusion) . 

2. Rules pertaining to composition 

If die test scores for elastic constraints C x and C 2 in a specified context are 
Xi and X 2 , respectively, then in the same context the test score for 

(a) Ciand C 2 is Xi A x 2 (conjunction ), where A £ min. 
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(b) C\ or C 2 is v ( disjunction ), where v = max. 

(c) If C\ then C 2 is 1 a (1-Ti+t 2 ) ( implication ) . 

3. Rules pertaining to quantification 

The rules in question apply to propositions of the general form 
Q A' s are B 's , where 2 is a fuzzy quantifier, e.g., most , many, several, few, etc, 
and A and B are fuzzy sets, e.g., tall men, intelligent men, etc. As was stated earlier, 
when the fuzzy quantifiers in a proposition are implied rather than explicit, their 
suppression may be placed in evidence by referring to the proposition as a disposi- 
tion. In this sense, the proposition overeating causes obesity is a disposition which 
results from the suppression of the fuzzy quantifier most in the proposition most of 
those who overeat are obese. 

To make the concept of a fuzzy quantifier meaningful, it is necessary to 
define a way of counting the number of elements in a fuzzy set or, equivalently, to 
determine its cardinality. 

There are several ways in which this can be done (Zadeh, 1978a; Dubois 
and Prade, 1985; Yager, 1980). For our purposes, it will suffice to employ the con- 
cept of a sigma-count , which is defined as follows: 

Let F be a fuzzy subset of U = { Wj, . . . ,u n ) 


expressed symbolically as 

F =\li/Ui+-" + \l n IU n = Z / p / /l< / 

or, more simply, as 

F =\iiui+- - + \l n u n , 

in which the term p , /«, ,/ = 1 , . . . , n , signifies that p, is the grade of membership of 
U; in F, and the plus sign represents the union. 

The sigma-count of F is defined as the arithmetic sum of the p, , i.e., 

Z Count(F) £ Z/p,- , i = 1, * . . , n , 

with the understanding that the sum may be rounded, if need be, to the nearest in- 
teger. Furthermore, one may stipulate that the terms whose grade of membership 
falls below a specified threshold be excluded from the summation. The purpose of 
such an exclusion is to avoid a situation in which a large number of terms with low 
grades of membership become count-equivalent to a small number of terms with 
high membership. 

The relative sigma-count , denoted by Z Count (FIG ), may be interpreted as 
the proportion of elements of F which are in G. More explicitly, 

Count (F r\G) 

Z Count (FIG) = — — J- 7 — , 

Z Count (G) 

where F , the intersection of F and G, is defined by 

(i J . n6 («) = |i f («)* Hg(k) , U e U . 


Thus, in terms of the membership functions of F and G, the the relative sigma-count 
of F in G is given by 


Z Count (FIG) = 


£|PfO*i) A P<? ( u i ) 
£,P<?(k,) 
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The concept of a relative sigma-count provides a basis for interpreting the 
meaning of propositions of the form Q A 's are B 's , e.g., most young men are 
healthy. More specifically, if the focal variable (i.e., the constrained variable) in the 
proposition in question is taken to be the proportion of B’s in A’s, then the 
corresponding translation rule may be expressed as 

Q A 's are B 's -» X Count(B I A )is Q . 

As an illustration, consider the proposition p = over the past few years 
Naomi earned far more than most of her close friends. In this case, we shall assume 
that the constituent relations in the explanatory database are: 

ED £ INCOME [Name; Amount; Year] + 

FRIEND [Name; p] + 

FEW [Number, jli] + 

FAR.MORE [Incomel; Income2; p] + 

MOST [Proportion; p] . 

Note that some of these relations are explicit in p; some are not; and that 
most of the constituent words in p do not appear in ED. 

In what follows, we shall describe the process by which the meaning of p 
may be composed from the meaning of the constituent relations in ED. Basically, 
this process is a test procedure which tests, scores and aggregates the elastic con- 
straints which are induced by p. 

1. Find Naomi’s income, IN i9 in Y ear i , i = 1, 2, 3,..., counting backward from 
present In symbols, 

INi = Amount INCOME [Name =Naomi ,Year =Year { \ 

which signifies that Name is bound to Naomi, Year to Year, , and the resulting 
relation is projected on the domain of die attribute Amount , yielding the value 
of Amount corresponding to the values assigned to the attributes Name and 
Year. 

2. Test the constraint induced by few: 

n, £ yFEW [Year = Yean] , 

which signifies that the variable Year is bound to Year, and the corresponding 
value of jli is read by projecting on the domain of p. 

3. Compute Naomi’s total income during the past few years: 

TIN 

in which the p, play the role of weighting coefficients. Thus, we are tacidy as- 
suming that the total income earned by Naomi during a fuzzily specified inter- 
val of time is obtained by weighting Naomi’s income in year Year { by the de- 
gree to which Year, satisfies the constraint induced by few and summing the 
weighted incomes. 

4. Compute the total income of each Namej (other than Naomi) during the past 
few years: 

TINamej = 1/ p, IName $ , 
where IName £ is the income of Namej in Year ; . 
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5. Find the fuzzy set of individuals in relation to whom Naomi earned far more. 
The grade of membership of Name j in this set is given by 

\iFM ( Namej ) = ^FARMORE [Income 1 =TIN ; Income 2=TINamej ]. 

6. Find the fuzzy set of close friends of Naomi by intensifying (Zadeh, 1978a) 
the relation FRIEND: 

CF ± CLOSE FRIEND £ 2 FRIEND , 

which implies that 

|Icf ( Namej ) = ( M {FRIEND [Name =Namej ]) 2 , 
where the expression 

^FRIEND [Name =Namej ] 

represents \l f (Namej ), that is, the grade of membership of Namej in the set of 
Naomi’s friends. 

7. Count the number of close friends of Naomi. On denoting the count in ques- 
tion by X Count (CF), we have: 

ICount (CF ) = |i 2 friend (Name } ). 

8. Find the intersection of FM with CF. The grade of membership of Namej in the 
intersection is given by 

Vmr\CF ( Name j ) = Vfm (N amej ) A \i CF (Namej), 

where the min operator a signifies that the intersection is defined as the con- 
junction of its operands. 

9. Compute the sigma-count of FM C\CF : 

ICount (FM (^\CF ) = Ij \i FM (Namej ) A [i CF (Namej). 

10. Compute the relative sigma-count of FM in CF, i.e., the proportion of individu- 
als in FM (~}CF who are in CF: 

A ICount (FM r^CF) 

P= I£ount(CF) 

11. Test the constraint induced by most: 

T £ pMOST [Proportion =p], 

which expresses the overall test score and thus represents the compatibility of 
p with the explanatory database. 

In application to the representation of dispositional knowledge, the first step 
in the representation of the meaning of a disposition involves the process of exploi- 
tation, that is, making explicit the implicit quantifiers. As a simple example, consid- 
er the disposition 

d ~ young men like young women 
which may be interpreted as the proposition 

p £ most young men like mostly young women . 



The candidate ED for p is assumed to consist of the following relations: 

ED £ population [ Name; Sex; Age ] + 
like [ Namel; Name2; |x ] + 

MOST [ Proportion; \i ], 

in which \i in LIKE is the degree to which Namel likes Name2 . 

To represent the meaning of p, it is expedient to replace p with the semanti- 
cally equivalent proposition 

q £ most young men are P , 

where P is the fuzzy dispositional predicate 

P £ likes mostly young women . 

In this way, the representation of the meaning of p is decomposed into two simpler 
problems, namely, the representation of the meaning of P, and the representation of 
the meaning of q knowing the meaning of P . 

The meaning of P is represented by the following test procedure. 

1. Divide population into the population of males, m.population, and popula- 
tion of females, fpopulation: 

M.POPULATION £ Name Age population [Sex = Male] 

FPOPULATION ^ Name Age POPULATION [Sex = Female] , 

where Name Age population denotes the projection of population on the attri- 
butes Name an dAge. 

2. For each Namej J = 1, in F.POPULATION, find the age of Namej : 

Aj ± Age F POPULATION [Name = Namej] . 

3. For each Namej , find the degree to which Namej is young: 

cc, = n YOUNG [Age =A y ] , 

where a,- may be interpreted as the grade of membership of Namej in the fuz- 
zy set, YW, of young women. 

4. For each Name { ,/ = 1, . . . , k , in M .population , find the age of Namei : 

Bi = Age M. POPULATION [Name = Namei] ■ 

5. For each Namei , find the degree to which Namei is young: 

8, YOUNG [Age =B ; ] , 

where 5, may be interpreted as the grade of membership of Name ,• in the fuz- 
zy set, YM, of young men. 

6. For each Namej , find the degree to which Namei likes Namej : 

P/y ^ n LIKE l Name 1 = Namei \Name 2 = Namej ] , 

with die understanding that p; ; may be interpreted as the grade of membership 
of Namej in the fuzzy set, WL t , of women whom Namei likes. 

7. For each Namej find the degree to which Namei likes Namej and Namej is 
young: 
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Jij =«y * P-7 • 

Note : As in previous examples, we employ the aggregation operator min (a ) 
to represent the effect of conjunction. In effect, is the grade of membership 

of Name j in the intersection of the fuzzy sets WL; and YW. 

8. Compute the relative sigma-count of young women among the women whom 
Namei likes: 

p, ±2Cowt(yW/WLi) 
lCount(YW WL;) 

YCount(WL;) 

= V &L 

2 ; P/7 

_ S; « 7 * Pi ; 

£ 7 P-7 

9. Test the constraint induced by MOST: 

T/ ^ pMO ST [Proportion = p,] . 

This test score, then, represents the degree to which Namei has the property 
expressed by the predicate 

P £ likes mostly young women . 

Continuing the test procedure, we have: 

10. Compute the relative sigma-count of men who have property P among young 
men: 

p * X Count (PIYM) 

E Count (P r^YM) 

X Count (YM) 

_ S/ T, A 8/ 

6 ; 

1 1. Test the constraint induced by MOST: 

T = p MOST [Proportion = p] . 

This test score represents the overall test score for the disposition young men 
like young women. 

THE CONCEPT OF A CANONICAL FORM AND ITS 
APPLICATION TO THE REPRESENTATION OF MEANING 

When the meaning of a proposition, p , is represented as a test procedure, it 
may be hard to discern in the description of the procedure the underlying structure of 
the process through which the meaning of p is constructed from the meanings of the 
constituent relations in the explanatory database. 

A concept which makes it easier to perceive the logical structure of p and 
thus to develop a better understanding of the meaning representation process, is that 
of a canonical form of p , abbreviated as cfip) (Zadeh, 1978b, 1986). 



13 


The concept of a canonical form relates to the basic idea which underlies 
test-score semantics, namely, that a proposition may be viewed as a system of elastic 
constraints whose domain is a collection of relations in the explanatory database. 
Equivalently, let X \, . . . ,X n be a collection of variables which are constrained by p . 
Then, the canonical form of p may be expressed as 

cf(p)*XisF , (4.1) 

where X = {X h . . . ,X n ) is the constrained variable which is usually implicit in p , 
and F is a fuzzy relation, likewise implicit in p , which plays the role of an elastic (or 
fuzzy) constraint on X . The relation between p and its canonical form will be ex- 
pressed as 

p ->X is F , (4.2) 

signifying that the canonical form may be viewed as a representation of the meaning 
ofp. 

In general, the constrained variable X in cf (p ) is not uniquely determined 
by p , and is dependent on the focus of attention in the meaning-representation pro- 
cess. To place this in evidence, we shall refer to X as the focal variable. 

As a simple illustration, consider the proposition 

p ^ Anne has blue eyes . (4.3) 

In this case, the focal variable may be expressed as 

X £ Color ( Eyes ( Anne )) , 

and the elastic constraint is represented by the fuzzy relation blue. Thus, we can 
write 

p — » Color {. Eyes ( Anne )) is BLUE . (4.4) 

As an additional illustration, consider the proposition 

p = Brian is much taller than Mildred. (4.5) 

Here, the focal variable has two components, X = (X lt X 2 \ where 
X ! = Height { Brian ) 

X 2 = Height {Mildred ) ; 

and the elastic constraint is characterized by the fuzzy relation MUCH.TALLER 
[Height 1 ; Height 2 ; p], in which p is the degree to which Height 1 is much taller 
than Height 2 . In this case, we have 

p -» {Height {Brian ) , Height {Mildred)) is MUCH.TALLER . (4.6) 

In terms of the possibility distribution of X , the canonical form of p may be 

interpreted as the assignment of F to Tl x . Thus, we may write 

p->X«F-)n x =F, (4.7) 

in which the equation 

n x =F (4.8) 

is termed the possibility assignment equation (Zadeh 1978b). In effect, this equation 
signifies that the canonical form cf{p) ^X is F implies that 
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Poss {X =u} = \Lp(u) , u e U , (4.9) 

where \i F is the membership function of F . It is in this sense that F , acting as an 
elastic constraint on X , restricts the possible values which X can take in U . An im- 
portant implication of this observation is that a proposition, p , may be interpreted as 
an implicit assignment statement which characterizes the possibility distribution of 
the focal variable in p . 

As an illustration, consider the disposition 

d £ overeating causes obesity , (4.10) 

which upon explication becomes 

p £ most of those who overeat are obese . (4.11) 

If the focal variable in this case is chosen to be the relative sigma-count of 
those who are obese among those who overeat, the canonical form of p becomes 

I Count {OBESE IOVEREAT) is MOST , (4.12) 

which in virtue of (4.9) implies that 

Poss {Z Count (OBESE IOVEREAT) = u} = \i M osr(u ) . (4.13) 

where \Imost is the membership function of MOST . What is important to note is that 
(4.13) is equivalent to the assertion that the overall test score for p is expressed by 

% = VmostV Count (OBESE /OVEREAT)) , (4.14) 

in which obese, overeat and most play the roles of the constituent relations in ED. 

It is of interest to observe that the notion of a semantic network may be 
viewed as a special case of the concept of a canonical form. As a simple illustration, 
consider the proposition 

p £ Richard gave Cindy a red pin . (4.15) 

As a semantic netwoik, this proposition may be represented in the standard form: 

Agent (GIVE ) = Richard (4.16) 

Recipient (GIVE ) = Cindy 

Time (GIVE ) = Past 

Object (GIVE) = Pin 

Color (Pin ) = Red . 

Now, if we identify X\ with Agent (GIVE), X 2 with Recipient (GIVE), etc., the se- 
mantic netwoik representation (4.16) may be regarded as a canonical form in which 
X=(Xi X 5) , and 

X\ = Richard 

X 2 = Cindy 

X 3 is Past 

X A is Pin 

X 5 is Red . 


(4.17) 



More generally, since any semantic network may be expressed as a collection of tri- 
ples of the form (Object, Attribute, Attribute Value), it can be transformed at once 
into a canonical form. However, since a canonical form has a much greater expres- 
sive power than a semantic network, it may be difficult to transform a canonical 
form into a semantic network. 

INFERENCE 

The concept of a canonical form provides a convenient framework for 
representing the rules of inference in fuzzy logic. Since the main concern of the pa- 
per is with knowledge representation rather than with inference, our discussion of 
the rules of inference in fuzzy logic in this section has the format of a summary. 

In the so-called categorical rules of inference, the premises are assumed to 
be in the canonical form X is A or the conditional canonical form X is A if Y is £, 
where A and B are fuzzy predicates (or relations). In the syllogistic rules, the prem- 
ises are expressed as Q A’s are B’s , where Q is a fuzzy quantifier and A and B are 
fuzzy predicates (or relations). 

The rules in question are the following 
CATEGORICAL RULES 

X, Y, Z, • • • ^ variables taking values in U,V f W 9 - • 

Examples 

X £ Age(Mary), Y = DistancefPl J*2) 

A,B,C,-' = fuzzy predicates (relations) 

Examples 

A = small , B = much larger 


ENTAILMENT RULE 
XisA 

A c B -» \i A ( u)<\l B (k), u € U 
XisB 

Example 

Mary is very young 
very young c young 


Mary is young 
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CONJUNCTION RULE 

XisA 

XisB 

XisAnB -> PAn*(u) = PA(u)A M*(u) 
n = intersection (conjunction) 

Example 

pressure is not very high 
pressure is not very low 

pressure is not very high and not very low 

DISJUNCTION RULE 

XisA 
or XisB 

XisA\jB -» \i akjB (u) = \l a (u)v \l B (u) 
= union (disjunction) 

PROJECTION RULE 
(X,Y) isR 

Xis^ -> M' X ft( || ) s * | PvM*(U'V) 

gR ^ projection of R on U 
Example 

(X,Y) is close to (3,2) 

X is close to 3 

COMPOSITIONAL RULE 

( X,Y)isR -» binary predicate 
YisB 

XisAoR -> MaoR 0* ) = sup y (p/j (m , v ) A p# (v ) ) 
Example 

X is much larger than Y 
Y is large 


X is much larger o large 



NEGATION RULE 


not (X is A) 

Xis-nA -> ^ (10 = 1-114 (u) 
-i £ negation 

Example 

not ( Mary is young ) 

Mary is not young 


EXTENSION PRINCIPLE 
XisA 

f(X)isf(A) 

A =|X 1 /M 1 +p 2 /«2 + * * * +M7|/W|! 

/(A) = n,//(tt 1 )+n 2 //(tt 2 )+ • • • +\i„/f(u„) 

Example 
X is small 
X 2 is ^ small 

small £ very small,^^,, = ( n^,, f 

It should be noted that the use of the canonical form of in these rules stands 
in sharp contrast to the way in which the rules of inference are expressed in classical 
logic. The advantage of the canonical form is that it places in evidence that infer- 
ence in fuzzy logic may be interpreted as a propagation of elastic constraints. This 
point of view is particularly useful in the applications of fuzzy logic to control and 
decision analysis ( Proc . of the 2nd IFSA Congress , 1987, Proc. of the International 
Workshop, Iizuka, 1988). 

As was pointed out already, it is the qualitative mode of reasoning that 
plays a key role in the applications of fuzzy logic to control. In such applications, 
the input-output relations are expressed as collections of fuzzy if-then rules (Mam- 
dani and Gaines, 1981). 

For example, if X and Y are input variables and Z is the output variable, the 
relation between X ,Y , and Z may be expressed as 

Z is C i if X is A } and Y is B l 
Z is C 2 tfX is A 2 and Y is B 2 


Z is C n if X is A n and Y is B n 

where C,-, A,-, and , i=l, . . . ,n are fuzzy subsets of their respective universes of 
discourse. For example. 
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Z is small ifX is large and Y is medium 
Z is not large ifX is very small and Y is not large 

Given a characterization of the dependence of Z on X and Y in this form, 
one can employ the compositional rule of inference to compute the value of Z given 
the values of X and T. This is what underlies the Togai-Watanbe fuzzy logic chip 
(Togai, 1986) and the operation of fuzzy logic controllers in industrial process con- 
trol (Sugeno, 1985). 

In general, the applications of fuzzly logic in systems and process control 
fall into two categories. First, there are those applications in which, in comparison 
with traditional methods, fuzzy logic control offers the advantage of greater simplici- 
ty, greater robustness, and lower cost The cement kiln control pioneered by the F.L. 
Smidth Company falls into this category. 

Second, are the applications in which the traditional methods provide no 
solution. The self-parking fuzzy car conceived by Sugeno (Sugeno, 1985) is a prime 
example of what humans can do so easily and is so difficult to emulate by the tradi- 
tional approaches to systems control. 

SYLLOGISTIC RULES 

In its generic form, a fuzzy syllogism may be expressed as the inference schema 
Q\A 's are B 's 
Q 2 C 's are D 's 
Q$E 's are F 's 

in which A, B, C, D, E and F are interrelated fuzzy predicates and Q V Q 2 and Q 3 are 
fuzzy quantifiers. 

The interrelations between A,B,C,DjE and F provide a basis for a 
classification of fuzzy syllogisms. The more important of these syllogisms are the 
following 

(a) Intersection! product syllogism : 

C=AkB,E=A,F=C kD 

(b) Chaining syllogism: 

C=£,F=A,F=D 

(c) Consequent conjunction syllogism: 

A=C=E,F=B A D 

(d) Consequent disjunction syllogism: 

A=C=£,F=B v D 

(e) Antecedent conjunction syllogism: 

F=D=F,F=A A C 

(f) Antecedent disjunction syllogism: 


B=D=F ,E=A v C 
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In the context of expert systems, these and related syllogisms provide a set of infer- 
ence rules for combining evidence through conjunction, disjunction and chaining 
(Zadeh, 1983b). 

One of the basic problems in fuzzy syllogistic reasoning is the following: 
Given A, B, C, D, E and F, find the maximally specific (i.e., most restrictive) fuzzy 
quantifier Q 3 such that the proposition Q$E’s are F's is entailed by the premises. In 
the case of (a), (b) and (c), this leads to the following syllogisms: 

INTERSECTIONIPRODUCT SYLLOGISM. 

QxA'sareB's (5.1) 

Q 2 (A and B) f s are C f s 
(Ci®G2M 's are (BmdC)'s 

where ® denotes the product in fuzzy arithmetic (Kaufinann and Gupta, 1985). It 
should be noted that (5.1) may be viewed as an analog of the basic probabilistic 
identity 

p(B,CIA) =p(BIA)p(CI A£) 

A concrete example of the intersection/product syllogism is the following: 

most students are young (5.2) 

most young students are single 
most 2 students are young and single 
where most 2 denotes the product of the fuzzy quantifier most with itself. 

CHAINING SYLLOGISM. 

Q\A's are B's 
Q 2 B's are C's 
(Gi®g 2 ) A's are Cs 

This syllogism may be viewed as a special case of the intersection product syllo- 
gism. It results when B c A and Q 2 and Q 2 are monotone increasing, that is, 't Q 2 
= Q Jf and > Q 2 = Q 2 , where > Q 2 should be read as at least Q Jf Q 2 . A simple ex- 
ample of the chaining syllogism is the following: 

most students are undergraduates 

most undergraduates are single 

most 2 students are single 

Note that undergraduates c students and that in the conclusion F = single , rather 
than young and single , as in (5.2). 

CONSEQUENT CONJUNCTION SYLLOGISM. 

The consequent conjunction syllogism is a example of a basic syllogism 
which is not a derivative of the intersection/product syllogism. Its statement may be 
expressed as follows: 
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Q iA ' s are B 's (5.3) 

QiA's are C's 
Q A’s are ( B an dC)'$, 

where Q is a fuzzy quantifier which is defined by the inequalities 

0©(Gi®G 2 ©1)^G — G 1 — ® G 2 (5.4) 

in which © ,® .® and© are the operations of v (max), a (min), + and - in fuzzy ar- 
ithmetic. 

An illustration of (5.3) is provided by the example 
most students are young 
most students are single 
Q students are single and young 

where 

2 most © 1 < G ^ most. 

This expression for Q follows from (5.4) by noting that 
most © most = most 
and 


0 © ( 2most © 1) = 2most © 1. 

The three basic syllogisms stated above are merely examples of a collection 
of fuzzy syllogisms which may be developed and employed for purposes of infer- 
ence from commonsense knowledge. In addition to its application to commonsense 
reasoning, fuzzy syllogistic reasoning may serve to provide a basis for combining 
uncertain evidence in expert systems (Zadeh, 1983b). 

CONCLUDING REMARKS 

One of the basic aims of fuzzy logic is to provide a computational frame- 
work for knowledge representation and inference in an environment of uncertainty 
and imprecision. In such environments, fuzzy logic is effective when the solutions 
need not be precise and/or it is acceptable for a conclusion to have a dispositional 
rather than categorical validity. The importance of fuzzy logic derives from the fact 
that there are many real world applications which fit these conditions, especially in 
the realm of knowledge-based systems for decision-making and control. 
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ABSTRACT 

We show how the theory of approximate reasoning developed by L.A. Zadeh 
provides a natural format for representing the knowledge and performing the 
inferences in rule based expert systems. We extend the representational ability of 
these systems by providing a new structure for including rules which only require the 
satisfaction to some subset of the antecedent conditions. This is accomplished by the 
use of fuzzy quantifiers. We also provide a methodology for the inclusion of a form 
of uncertainty in the expert systems associated with the belief attributed to the data 
and production rules. 

INTRODUCTION 

In [1] Buchanan and Duda provide an excellent introduction to the principles 
of rule-based expert systems. In [2] Buchanan provides a bibliography on expert 
systems. A particularly well cited example of a rule based expert system is MYCIN 
[3,4]. In [S] Van Melle has abstracted the basic structure of the MYCIN system and 
provided a language for the development of prototypical rule based expert systems 
called EMYCIN. 

A rule based expert system is essentially an example of a production system 
consisting of the following components [1]: 

1. A rule or knowledge base - This consists of the experts knowledge 
in the form of conditional type statements. Each conditional statement consisting of 
an antecedent portion and consequent portion. Typical rules are of the form 

if antecedent then consequent. 

2. A problem or global database - This consists of a set of facts or 
assertions about the current problem. 

3. A rule interpreter- This consists of the portion of the system that 
carries out the problem solving. The rule interpreter can be considered to have two 



28 


components. The first component consists of the inference mechanism. This helps 
to determine when a particular rule is valid and what is the effect of applying this 
rule. The second component consists of some meta-rules which helps determine in 
which order the rule base is to be searched for applicability of rules. 

The information in the problem database as well as the antecedent and 
consequents of the rules are of the form the (attribute) of ( object ) is (value), an 
optional certainty measure can be assigned to these propositions. 

The expert system is generally activated by the introduction of a problem to 
be solved in the form of a goal to be satisfied as well as the insertion in the problem 
data base of data about the current problem. In many cases the expat system is a 
pattern directed or forward chaining production system. In this type of situation the 
problem is initiated with the insertion of the goal state and the data rule base is then 
searched to find which rules can be applied based upon the information in the 
problem database. A rule is applicable if the information in the global database 
satisfies the antecedent portion of the rule. If a rule is fired the appropriate 
information is added to the global database forming a new augmented database. One 
sees this in essence as being an application of modens pollens. The rule base is then 
again searched for fireable rules using this new augmented global database again 
adding new information to the database. The meta-knowledge in the rule interpreter 
is used to help direct the search for fireable rules. The determination of good 
heuristics for searching the rule base plays a significant role in the intelligent aspects 
of the system. The process of firing rules continues until no new information can be 
added or a goal state is reached. 

In this paper we are concerned with the question of representation of the 
propositions forming the information in the global database and the antecedents and 
consequents in rules, as well as the inference mechanism used to infer the 
consequence of fired rules. We shall also provide a format for the representation of 
complex rules in which only some of the antecedents need to be satisfied. 
Furthermore, we shall provide a mechanism for the inclusion of some forms of 
uncertainty. In particular we shall suggest that the theory of approximate reasoning 
based upon fuzzy subsets developed by L.A. Zadeh provides a very robust 
methodology for representing the propositions and implementing the appropriate 
inference [6-10]. We note that Yager [1 1,12] has provided an approach for querying 
large knowledge bases of the type found in expert systems where the information is 
described in terms of fuzzy propositions. 

REPRESENTATION OF DATA AND RULES 

As noted by Buchanan and Duda [1] the fundamental building block for the 
information in both the database and the rule base of an expert system are 
propositional statements of the form: 

the (attribute) of (object) is (value) 

For example 

The height of John is 6 feet. 
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The temperature of the patient is 102. 

One can combine the ideas of attribute and object into a concept called a variable. 
Thus in the above examples John's height and the patient's temperature can be 
considered variables. In this notation the fundamental building blocks of the rule- 
based expert systems would be 

V is A, 

where V is a variable, attribute (object) and A is its current value. 

It is at this point we diverge from the current representational approach to 
expert systems knowledge. In the current systems, such as MYCIN, the values of 
the variables are left as symbols, words or values with no meaning. That is, the data 
Temperature is high is left in this form, no attempt is made to give any meaning to 
the value high. That is, the values are considered as atomic items with no further 
attempts at understanding their meaning. The matching used to determine the 
fireability of rules is carried out at this level of semantics. Using the values at this 
level of detail provokes some important questions. When two people use the same 
word, such as the designer of a system and the user, do they mean the same thing ? 
Secondly, if a rule has a certain value for a variable in its antecedent can we still learn 
something about the consequent variable if we only know that the value of antecedent 
is close to the value in the rule ? The ability to handle these types of problems 
requires us to provide a deeper semantics for the values associated with variables. 
Just as the predicate logic refines and improves upon the propositional logic by 
further decomposing the atomic statements the theory of approximate reasoning [6- 
10] further refines the meaning of the values associated with variables. 

The approach we suggest is based upon idea of fuzzy subsets introduced by 
Zadeh [13]. Assume X is a set of objects. A fuzzy subset A of X is a subset in 

which the membership grade for each x e X is a element in the unit interval [0,1], 
We denote this membership function A(x). In our approach a proposition such as 

Age is old 

has the effect of associating with the variable age a possibility distribution [9]. 
Assume we have the proposition 
Vis A 

where A is some value. We can express A as a fuzzy subset of a base set, the set 
values the variable can assume. For example, if A is old we can express A as a fuzzy 
subset if interval of ages [0,150]. In particular, X is the set of all values that V can 
assume. The statement in turn induces a possibility distribution, 7t v over the set X 
such that 

n v (x) = A(x), 

where A(x) is the membership grade of x in A. In particular n v (x) is seen to be the 
possibility that V = x given the data V is A. 

In a rule based expert system the fundamental component of the rules are 
conditional statements of the form 

ifVj is A then V 2 isB. 

As suggested by Zadeh [8] propositions of this type can also be seen to 
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generate possibility distributions. In particular if V j and V2 have as their base sets 
the sets X and Y respectively then 

if Vj is A then V2 is B 

induces a conditional possibility distribution over X x Y such that 
n vi lv 2 ( x > y) = Min ( 1, 1 - A(x) + B(y)). 


an alternative definition is 

n vi iv 2 ( x * y) = m^ (! - A ( x ). B (y))- 

Thus in this approach the effect of both data statements and rules are to introduce 
possibility distributions. 

More complex forms of rules can easily be represented in this approach. If 
Vl> V2, . • .V„ are variables taking values in the base sets Xj, X2, . . . X n 
respectively then the statement 

Vj is Aj and V2 is A2 and . 

is seen to induce the joint possibility distribution II 


and V„ is A„ 


■Vl.V2.V3. V r 


over 


Xj x X2 ... x X n such that 

y^ > y y y n ( x l> x 2 > • • • x n) = Minj [Aj(xj)] 

The statement V = Aj or V2 = A2 or V n = A„ induces the joint possibility 
distribution Ily^ v 2 , y.j, V„ over X 1 x x 2 ••• x x n such 11131 
n Vb y 2> y 3 . y n ( x l> x 2 » • • • x n) = Max, [Aj(xj)]. 

With these ideas we can easily represent more complicated rules. Let Vj, V2 . . .V n 
be variables with base sets Xj . . .X„ respectively and let Uj,U2, . . .Up be 
variables with base sets Y j, Y2, . . . Y„ respectively. Consider the rule 
if Vj is Aj and V2 is A2 . . . and V n is A„ then Uj is Bj. 

This induces a condition possibility distribution n U j| V j.v2> • • v„ over 


Xi x X2 . . . X„ x Yi such that 

n ui | Vl ,v2, . . . v n ( x l. x 2 > • • • x n> yi) = M 111 (!. 1 -H(x!, x 2 , . . . x„) + B^yj)), 
where H(xi, X2, . . . x n ) = Minj [Aj(xj)]. 

Consider the rule 

ifVj is A] and V2 isA2 . . . and V n isA n 

then 

Uj isBj or U2 is B 2 or UpisBp. 

This generates the conditional possibility distribution n U j, U 2,...Uplvj,V2,...v n 
over the set Xj x X2, . . . x X n x Yj x Y2 x Y3 . . x X p such that 

n Ui, u 2 Up/vi, v2, . . . vp (*i. x 2> x n » yi»y2* • • • yp) = 

Min (1 .l-Hfrj^, . . . x n ) + G(y 1 ,y 2 > • • • y p ) ) 


where 
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and 


H(x lt x 2 , . . . x n ) = Minj ( A i ( x i) ) 


G(yi, y2, • • • y p ) = Maxi (B, (yi) ). 
Other complex rules can be expressed in a similar manner. 


INFERENCES FROM THE SYSTEM 


The ability to use the database to search the rule base to infer further data in 
this approach is based upon the inference laws of the theory of approximate 
reasoning. The essential laws for this purpose are the conjunction principle, and the 
entailment principle. These laws are related respectively to the laws of adjunction, 
law of simplification, law of modens pollens and the law of addition in the classic 
binary propositional logic. 

The conjunction principle states that if we have two pieces of data about 
some variable, for example 

V is A => n V (x) = A(x) 

V is B => n V 00 = B(x) 

then we can conjunct these distributions getting the proposition V is C where 


C(x)=A(x) A B(x) 

C(x) = Min (A(x), B(x) ). 

The projection principle allows us to project out marginal possibility distributions 
from joint distributions. Assume II Vl > V2 is a joint possibility distribution over the 

base set Xj x X 2 , then this the projection principle allows us to infer that 

Max 

n vi (x) = all [n vi>V2 (x, y)J. 

ye X 2 

The law of fuzzy compositional inference which combines conjunction and 
projection plays a role similar to modens pollens in binary logic. Consider the data 

V is Aj 

and the rule 


if V is A 2 then U is B 

The proposition Vis A j induces the possibility distribution 

n v = A 1 (x) 

over X. The rule ifVisA2 then U is B induces the conditional possibility 
distribution 

n u |v(x,y) = Min [1, 1 - A 2 (x) + B(y)] 

over X x Y. 

The law of fuzzy compositional inference says from these two pieces of 
information we can info* that 

II u (y) = Max x [IIv(x) a n u | v (x, y)l. 

Consider the situation where there is more than one element in the antecedent 
if Vj is Aj and V 2 is A 2 then U is B. 
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Let our data be 


Vx isCx 
V 2 isC 2 

First the rule "if Vx is Ax and V 2 is A 2 then U is B" induces the conditional 
possibility distribution 

IIulvx, v 2 (y. x l* x 2 ) = Min[l,l - (Ax(xx) a A 2 (x 2 » + B(y)] 

In order to obtain U from this via fuzzy composition inference, we need 

Ilv, v-.- This can be obtained from our data as 
v l> v 2 

n vi , v 2 ( x l- x 2> = Min t c l( x l). C 2 (x 2 )] 

then 

n u (y) = Max (xi>X2) [n V x, v 2 (*i. x 2) a n u ivx, V2 <y. x i> x 2)i 

Consider next the situation in which we have two elements in the 
consequent of our rule: 

ifV is A then Uj is Bj or U2 is B 2- 
This induces the conditional possibility distribution 

^uj,u 2 lv frl’ y 2* x ) = Min [1, 1 - A(x) + (Bx(yx) v B 2 (y 2 ))] 


using the data 

VisC(n v (x) = C(x)) 

we can apply fuzzy composition inference to obtain 

n„i, u 2 (yi* y2) = m^x tn v (x) a n Ul , U2 | V (x, yx, y^i 

The projection principle can now be applied to get either IT Ul or n u2 - For 


example 

Il Ul (yx) = Maxy 2 tn ui> U2 (yx, y^] 

The entailment principle implies that from the datum V is A, we can infer 
V is B, where B is any fuzzy subset such as the A C B. 

We now can see the applicability of this theory to rule-based expert 
systems. Our global data base consists of information of the form Vj is Aj, our rule 
base consists of rules of the type "if Vj is Bj then Uj is Cj" by application of the 
laws of inference especially compositional fuzzy inference we can obtain new 
information to add to our global data base. 


QUANTIFIERS IN THE ANTECEDENT 


In this section we shall provide an extension of the ideas presented in the 
previous part to allow for the representation of more sophisticated rules in our rule 
base, such as: 

"if most of the conditions Vx is Aj, V 2 is A 2 ,. . .V n is A n are satisfied 

then U is B" 

"if at least half of the conditions Vx is Ax, V 2 is A 2 , . . .V n is A n are satisfied 
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then U is B" 

The ability to represent such rules will greatly enhance the ability of any 
expert system to capture the types of rules used by experts. 

We shall provide a methodology for representing such rules in a manner 
consistent with the rest of our formulation and one which allows inferences to be 
made about the value of the consequence using the rule and observed values about the 
variables in the antecedent. This methodology is based upon Zadeh's [14] 
representation of quantifiers and Yager’s procedure for evaluating quantified statements 
[15]. 

The class of rules we are concerned with can be described to consist of the 
following components, an antecedent and a consequence. The antecedent component 
consists of a collection of requirements specified in the form of proposition of the 
type Vj is Aj is a variable and Aj is a fuzzy subset of the base set X. In addition the 
antecedent consists of a quantifier, Q, such as most, all, almost all, at least one, at 
least half, etc. The consequent consists of a proposition of the type U is B. 

The rule than reflects the fact that if Q of the antecedent conditions, the 
Vj is Aj's, are satisfied than U is B can be added to our knowledge base. The 
fundamental difference between this type of rule and the types studied in the previous 
section is that rather than requiring all the antecedent conditions to be satisfied, only 
Q of them need be satisfied. 

Like the other types of conditional rules, these rules also induce a 

conditional possibility distribution IL over the set Xi x Xi x . X n 

TJIV 1( v 2 ,...v n 

x Y. In particular for any point (xj, X 2 , . . .x n , y) where Xj e Xj and y € Y 
n uiV lt v 2 , V n ( Xl> x 2* • • - x n> y) = Mi" U. 1 - H(xx, x 2 , . . .x n ) + B(y)]. 

The essential difference lies in the determination of the joint possibility 
H(xj, . . x„), the component due to the antecedent. The method for determining 
this H is based upon ideas developed by Yager [13]. 

As suggested by Zadeh [14], a linguistic quantifier can be expressed as a 
fuzzy subset. In particular there exists three kinds of quantifiers, the first two of 
which are of interest to us. A kind one quantifier or absolute quantifier such as 
"about 5", "at least seven" and a kind two or relative quantifier is exemplified by 
values such as "almost all” and "at least half." As suggested by Zadeh, a kind one 
quantifier can be expressed as a fuzzy subset of the non-negative reals whereas a kind 
two quantifier can be expressed as a fuzzy subset of the unit interval. For example, if 

Ql is the kind one quantifier "at least 5", then for each x e R + , Q(x) indicates the 
degree to which x satisfied the concept "at least 5". Similarly, if Q 2 is a kind two 
quantifier, "most", then for any x e I, Q(x) indicates the degree to which the 
proposition x satisfies the concept "most”. 

Let Q be a quantifier either kind I or kind II, with base set W, for kind I, 

W = R + and for kind II, W = [0, 1]. Then Q is said to be monotonically non- 
decreasing if for any wj, w 2 e W such that u 2 > uj then Q(u 2 ) > Q(uj). We shall 
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restrict ourselves to these monotonically non-decreasing quantifiers as they appear to 
be the types that naturally appear in the rules used in expert systems. 

We can now describe the procedure for obtaining H, from the antecedent of 
Q and the conditions Vj is Aj. We shall initially assume Q is a kind I quantifier. 

For any point (xj, X2, . . . x„) e Xj x X2 x . . . X„, where X; is the base 
set of Aj we obtain H(xj, X2, . . . x n ) in the following manner. 

Let D(xj, X2, . . .x n ) = (A^xj), A2(x2>, . . . A n (x n )} and let 

D i ( x l» x 2 » • • • x n) the i 1 * 1 largest element in the set D(xj, X2, . . .x n ). 

For any absolute quantifier Qj 

H(xj, . . . x„) = Maxj [QjO) a Dj(x lt . . .x n )]. 

If Qj is a relative quantifier then we replace Qj(i) by Qj(iln). 

Having obtained H(xj, . . . x„) for every (xj, . . . x„) e Xj x X2 x . . . X„ we 
obtain 

n u IVi,V2, . . .V n ( x l- • • • x n ) = Min [ 1 - H(x lf . . . x n ) + B(y)] 

This then becomes the induced possibility distribution from the rule, 

if Q ofVj is Aj, V2 isA2 , . .V n isA n are satisfied then U isB. 

If in addition we have in our database the values Vj is Cj, V2 is C2, . . . 
V„ is C n , where Q is a fuzzy subset of Xi then we can obtain a value for U as 

UisM 

where 

M(y) - M» (x| , X2> „ X[i) in uivi V2 Vn (x,.x 2 , x„) A n vi V2 Vn (*l.x 2 ...*„> 

where II (x h x 2 ,..x n ) = MinjCi(xi). 

v l> v 2> • v n 

A simple example will illustrate this procedure. Assume Vj, V2, V3, U 
are variables with base sets 

X^ta.b} 

X 2 ={c,d} 

X 3 = {e,f} 

Y = {g, h} 

Let Q be the kind II quantifier, most defined by 

0 ( 0 ) = 0 , Q(l/ 3 ) = 0 , Q( 2 / 3 ) = 1 / 2 , Q(l) = 1. 

Assume our rule is 

If Q of [ V 1 is Aj, V2 is A2, V3 is A3] are satisfied then U is B 

where 

Ai = a= { 1/a, 0/b> 

A 2 = c={l/c, 0 /d) 

A 3 = f = { 0 /e, 1 /f) 

B = g = { 1 /g, 0 /h) 

We shall first obtain H. Consider the point (a, c, e) 
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D(a,c,e)= {1,1,0} 

hence 

Dj (a, c, e) = 1 
D 2 = (a,c,e) = 1 
D 3 (a,c,e) = 0. 

Using this we get 

H(a,c,e) = Maxj [Q(i/3) a Di (a,c,e)] = [OaI, 1/2a1, IaO] = 1/2 

The following table provides the formulation of H, H = Max[0Adj, l/2Ad 2 , lAd 3 ] 
XJ x 2 X 3 A(xj) A(x 2 ) A(x 3 ) dj d 2 d 3 H 

ace 1 1 0 1 1 0 .5 

a c f 1 1 1 1111 

ade 1 0 0 1001 

a d f 1 0 1 1 1 0 .5 

bee 0 1 0 1000 

b c f 0 1 1 1 1 0 .5 

bde 0 0 0 0000 

bdf 0 0 1 1000 


n UIV r V 2 »V 3 (x l’ x 2> x 3* y) = Min (1, 1 - H (x lt x 2 , x 3 ) + B(y) ) 

the following table expresses 7t u lVi,V 2 ,V 3 

X 1 x 2 x 3 y Jt u IVi,V 2 ,V 3 (x 1 ,x 2 ,x 3 , y) 
a c e g 1 a (1 - * + 1) = 1 

a c e h 1a (1 - i + 0) = 1 

a c f g 1 a (1 - 1 + 1) = 1 

a c f h 1a (1 - 1 + 0) = 0 

a d e g 1 a (1 - 0 + 1) = 1 

a d e h 1a (1 - 0 + 0) = 1 

a d f g 1 a (1 - * + 1) = 1 

a d f h 1a (1 - * + 0) = * 

b c e g 1 a (1 - 0 + 1) = 1 

b c e h 1a (1 - 0 + 0) = 1 

b c f g 1 a (1 - i + 1 ) = 1 

b c f h lA(l-i + 0) = i 

b d e g 1 a (1 - 0 + 1) = 1 

b d e h 1a(1-0 + 0) = 1 

b d f g 1 a(1-0+1) = 1 

b d f h 1a (1 - 0 + 0) = 1 

Assume we have the data 


V 1 = {l/a,0/b}=a = Ci 
V 2 = { 1/c, 0/d} = c = C 2 
V 3 = {0/e, 1/f } = f = C 3 
then to obtain IT U (y) we see that 
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n u (y) = Max( Xl , x 2 , X 3 ) nulV 2 .V 2 .V 3 ( x l* x 2 > x 3> y)AC 1 (x 1 )AC 2 (x 2 )AC 3 (x 3 )] 
hence 

n u (g) = Max [IaIaIaO, IaIaIaI, IaIaOaO, IaIaOaI, IaOaIaO, IaOaIaI, 
IaOaOaO, IaOaOaI] =Max [0,1,0,0,0,0,0,0] = 1 
II U (h) = Max [1/2a1a1a0, OaIaIaI, IaIaOaO, 1/2a1a0a1, IaOaIaO, 

1/2a0a1a1a1, IaOaOaO, IaOaOaI] = Max [0,0,0 0] = 0 

hence as we would have anticipated 

U = { 1/g, 0/h} = g 
Consider the next situation where 

V 2 =[ 0 /a, 1/b} =b 
V 2 =[l/c,0/d}=c 
V 3 = [0/e, 1/f} = f 

n u (g) = Max [IaOaIaO, IaOaIaI, IaOaOaO, IaOaOaI, IaIaIaO, IaIaIaI, 
IaIaOaO, IaIaOaI] = Max [0,0,0,0,0, 1,0,0] = 1 
n u (h)= Max [1/2aOa1aO, OaOaIaI, IaOaOaO, 1/2a0a0a1, IaIaIaO, 1/2a1a1a1, 
IaIaOaO, IaIaOaI] = Max [0,0,0,0,0,l/2,0,0] = 1/2 

hence 

U=[l/g, .5/h] 

In the situation where 

Vi = { 1 /a, 1 /b] = 1 don't know” 

V 2 = { 1/c, 0/d] =c 

V 3 = [0/e, 1/f] =f 

we can show that again 

U=[l/g, .5/h] 

In the case where 

V 2 =[ 0 /a,l/b}=b 
V 2 = [0/c, 1/d) = d 
V 3 = [0/e, 1/f) = f 

we can show that 

U= [1/g, 1 /h] (Unkown). 

The following theorem shows that the conjunction of antecedent conditions 
is one of the quantifiers. 

Theorem: When Q is the quantifier all then the rule 

If Q [Vi is Ai] are satisfied then U is B I 
is equivalent to the proposition 

ifVj is Xj and V 2 is X 2 and . . . V n is X n then U -B n . 

Proof: For rule II we have 

ifV isHtheU isB 

where V = (Vj, V 2 , V 3 , . . . ,V n ) and 

H(x lt . . . x n ) = A 2 (X 2 ) a A 2 (X 2 ) ... a A n (x n ). 


For rule I we have 
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if V isGtheUisB 

where 

G(x 1 ,...X n ) = Max i [Q(i)AD(i)] 

where D(i) = i*h largest element in the set Cl = {Ai(xi), A2(x2>, . . . A„(x n ). 
When Q is the quantifier all, then 

Q(i) = 1 i = n 
Q(i) = 0 i * n. 

In this case 

G(xj, . . . x„) = 1 a D(n) = n ^ 1 largest element in Cl 
hence G^, . . . x n ) = A^xj) a A2(x2> a ... a A n (x n ) = H(xi, . . . x n ). 
Theorem: When Q is the quantifier at least one then the rule 
ifQ [Vi is AjJ then U is B I 
is equivalent to the proposition 

ifV i is A i or V2 is A2 or V n is A n then U is B. in 

Proof: For rule III we have 

if V is H the U is B 

where V = (V^ V 2 , V 3 , ... ,V n ) and 

H(xj, . . . x n ) = Aj(xj) v A 2 (X2) v . . . . A n (x n ) 

For rule I we have 


where 


if V is G the U is B 


G(x 1 ,...X n ) = Max i [Q(i)AD(i)] 

where D(i) = i 1 * 1 largest element in the set Cl = { Aj(xi), A2(x2>, . . . A n (x n ). 
When Q is the quantifier at least one, then 


Thus 


Q(i) = 1 for all i > 1 . 

G(xj x n ) = Aj(xj) v A 2 (X2) v A n (x n ). 


CERTAINTY QUALIFICATION 


In providing information to the database and rule base of an expert system, 
as discussed by Buchanan and Duda [ 1 ], a person may not be completely confident as 
to the value he is providing for a variable. Thus a user of a system may provide the 
information that 

V is A with confidence (or certainty) a. 

In the above the quantity a, which is a number in the unit interval, 
expresses the degree to which the informant believes that this information is valid. 

We would like to provide a mechanism to include these types of qualified 
statements into our system. In the spirit of keeping the very powerful structure 
which we have developed the approach will be to assume that a statement 

V is A with a confidence 
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is equivalent to an unqualified statement of the form 

VisB. 

This new statement implicitly implies a confidence of one. Thus we see that the 
statement 

V is B with 1 confidence 

is equivalent to V is B. Thus all our previous work can be seen to have been done 
with the implicit certainty one. 

We impose the condition that the statement 

V is A with zero confidence 

should be equivalent to the proposition V is X, where X is the base set of V. Thus 
zero confidence is equivalent to saying I don't know anything about the value of V 

We should note that this is different than probability, for in probability we 
should have 

V is A zero probability ^ V is A with 1 probability 

In general we see that an informant usually makes a tradeoff in providing 
information between the specificity of the information and the confidence. That is, 
the more specific he is required to provide the information the less confident he can 
be about it 

In [17] Yager has suggested a mechanism to transforming statements of the 

form 

V is A with confidence a 
into statements of the form 

VisB 

with implied confidence one. In particular if A and B are fuzzy subsets of X then 

V is A with confidence a 


can be transformed into the equivalent proposition V is B where for any x e X 

B(x) = (a a A(x)) + (1-a) 

NOTE: For the statement V is A with confidence 1 , then we get 

V is A. 

Proof: B(x) = (a a A(x)) + (1-a) = 1 a A(x) + 0 = A(x). 

NOTE: For the statement V is A with confidence 0, then we get the unqualified 
proposition V is X. 

Proof: B(x) = (a a A(x)) + (1-a) = 0 a A(x) + (1-0) = 1. 

In [18] Yager has introduced a measure of specificity associated with a fuzzy 
subset. Assume F is a fuzzy subset of the finite set X, then the specificity of F, 
S(F) is defined as 


S(F) = 



1 

cardF a 


da 


where F a = [x I F(x)>a] , Card F a is the number of elements in F a and a max is 
the largest membership grade in F. For the case where F is normal, then 
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S(F) = I * da 

J o cardF a 

Yager [18] has shown for the case of normal fuzzy subsets if FCG, that is 
for F(x) < G(x) for all xe X, then 

S(F) > S(G). 

The following theorems reinforce our observations about the tradeoff 
between specificity and certainty. 

Lemma : If A is normal, then the transformation of the proposition 

V is A with a certainty into the proposition V is B will yield B as a normal set. 
Proof: Let x be such that A(x) = 1, then 

B(x) = aAA(x) + (1-a) = aAl + (1-a) = a + (1-a) = 1. 

In the following theorems A is assumed normal. 

Theorem: Assume the proposition V is A with a certainty transforms into 

the proposition V is B then 

S(A)>S(B). 

Proof: We shall first show that for each xe X, B(x) > A(x) from the definition 

B(x) = (a A A(x)) + (1-a). 

Assume a ^ A(x) then 

B(x) = a A A(x) + (1-a) = A(x) + (1-a) > A(x). 

Assume a < A(x) then 

B(x) = a + (1-a) = 1 > A(x). 

Since B(x) > A(x) for each x, then it follows that S(A) > S(B). 

Thus we see that the act of qualifying a proposition by a certainty has the 
effect of reducing the specificity of its unqualified equivalent 
Theorem: Assume V is A with aj certainty transforms into V is Bj and 

V is A with a 2 certainty transforms into V is B 2 if aj > a 2 , then 

Proof:B j(x) = (aj a A(x)) + (1-aj) and B 2 (x) = (a 2 a A(x)) + (l-a 2 ). There are three 

possibly situations: 1. A(x) < a 2 ^ aj. In this case 

Bi(x) = A(x) + (l-a 1 ) 

B 2 (x) = A(x) + (l-a2), 

since aj > a 2 , then (1-aj) < {\-dq) and hence B 2 (x) > Bj(x). 

2. a 2 ^ A(x) < aj. In this case 

B 1 (x) = A(x) + (l-a 1 ) 

B 2 (x) = a 2 + (l-a 2 )=l>B 1 (x). 
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3. a2— a l^ A(x). In this case 

B 1 (x) = a 1 + (l-a 1 )=l 

B 2 (x) = a 2 + (l-a 2 ) = 1 > B^x) 

Since Bj(x) < B 2 (x) for all x, S(Bj) > S(B 2 ). 

It should be noted that this approach to certainty qualification can easily be 
applied to rules in the expert system. Consider the rule 

if V is A then U is B with CL certainty 

where A and B are fuzzy subsets of X and Y respectively. This transforms into the 
possibility distribution, 

n u l v (x,y) = (H(x,y) a a) + (l-a) 


where 

H(x,y) = Min(l, 1-A(x) + B(y)). 

REPRESENTATION OF DEFAULT KNOWLEDGE 


The construction of useful knowledge based systems requires the 
representation and manipulation of so called commonsense knowledge . 
Commonsense knowledge is very often characterized by pieces of knowledge that are 
usually true but not necessarily always true. The essential feature of this is the 
assumption of a piece of knowledge without conclusive evidence of its truth. Within 
this approach one assumes some piece of commonsense knowledge as valid if it is 
consistent or possible within the framework of what we already know. 

It is the use of the absence of contradictory evidence which strongly 
characterizes the process of commonsense reasoning. That it, classic reasoning 
systems require the certainty of propositions before asserting its truth. In 
commonsense reasoning systems some facts are asserted as true it there exists a 
possibility of it being true, nothing contradicts it Knowledge of this type is often 
called defeasible because we want the option of withdrawing it if contradictory 
evidence subsequently appears. 

Systems which allow for the inference of information based upon the lack of 
some contradictory fact are faced with the problem that their reasoning process is 
nonmonotonic. In particular some proposition that was inferred may cease to be 
inferable with the acquisition of further knowledge. 

In [16] Yager introduced a reasoning system which we shall call fuzzy 
default reasoning. This system is rooted in the theory of approximate reasoning [17] . 
He order to discuss this system we need introduce some additional concepts. 

Intuitively speaking the statement 

Vis A 

says that the value of V lies in the subset A. Knowledge that V is A can be used to 
help determine viability of other statements. If 

V is B 
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is a second statement we define 

Poss[V is B/V is A] = Max x [A(x) a B(x)]. 

Formally this definition captures a measure of the degree of intersection between the 
two sets A and B. Pragmatically, this measure provides an upper bound on the truth 
of the statement V is B given V is A. That is if A and B intersect then it is possible 
that V lies in B, V is B is true, given that V is A. We shall see this is a measure of 
consistency of the two statements. A second closely related definition is 
Cert[V is B/V is A] = 1 - Poss[B/A] 

We note that an equivalent formulation is _ 

Cert[V is B/V is A] = Min x [A(x) v B(x)]. 

Formally this definition captures the degree to which A is contained in B. 
Pra gmatic ally this measure provides a lower bound on the truth of V is B given V is 
A. In general 

Cert[V is B/V is A] < Poss[V is B/V is A] 

In binary reasoning systems we require that 
Cert[V is B/KB] = 1, 

where KB is our knowledge base, to infer that V is B is true. We shall see that in 
the commonsense environment we essentially allow an inference of a commonsense 
piece of knowledge to occur if 

Poss[V is B/KB] = 1 
Recalling that a rule is of the form 

if V is A then U is B, 

where A and B are fuzzy subsets of the base sets X and Y. The above statement gets 
translated into a joint canonical statement 

(V, U) is D 

where D is a fuzzy subset of X x Y such that 

D(x, y) = (1 - A(x» v B(y). 

If we have two pieces of knowledge 

If V is A then U is B 
VisE 


then we can conjunct these to get 

(U, V) is H. 

Here H is a subset of the cartesian space X x Y such that 

H = EnD = (AuB)nE = (AnE)u(BnE). 
The inferred value of V, denoted G, can represent this as a 

Vis G 


where _ _ 

G(y) = Max x [A(x) a E(x>] v B(y) = Poss(A/E) v B(y) = (1 - Cert(A/E)) v B(y). 
We see that if weare certain that A occurs given E, effectively E c A, then we get 
G = B. If Poss(A/E) = 1 then we get G = Y = "unknown", hence no inference is 
made. 

In [16,18,19] Yager has suggested that we can use possibility qualification 
as a basis for the implementation of many different kinds of commonsense 
knowledge. A possibility qualified statement is of the form 
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V is A i s, possible . 

This statement characterizes a piece of information that says our knowledge of the 
value of V is such that it is possible (or consistent) with it to assume that V lies in 
the set A. Note that it doesn't specifically say V lies in A. Formally this statement 
gets translated into 

V is A + 

where A + is a subset of the power set of the base set X. In particular for any subset 
G of X 

A+(G) = Poss[A/G] > Max x [A(x) a G(x)] 

Essentially A + is made up of the subsets of X which intersect, are consistent, with 
A. 

Closely related to possibility qualification is certainty qualification. A 
statement 

V is A is certain 

translates into 

VisA v 

where A^ is a subset of the power set of the base set of A, X, such that for any 
subset F of X 

A V (F) = Cert(A/F) = 1 - Poss (A@/F) 

We shall now describe the representation of some primary types of commonsense 
knowledge by the possibilistic reasoning approach. 

We shall initially consider the statement 
typically V is A. 

The interpretation of "typically V is A" afforded by Reiter's default reasoning 
system [20] is to say "if we have not established V is ->A then assume V is A. Thus 
we can translate the above into 

if V is A is possible then V is A. 

Using our translation rules we get 

if V is A + then V is A. 

This translate into 

V is -i(A + ) u A. 

We shall denote -n(A + ) as A*, hence we get 

V is (A* u A). 

Furthermore assume that our knowledge base consists simply of the fact that 

Vis B. 

Combining this with our typical knowledge we get V is D where 

D = (A*nB)u(An B). 

Furthermore as discussed in [16,18,19] this becomes 

D(x) = (B(x) a (1 - Poss[A/B]) v (A(x) a B(x)). 

Two extremal cases should be noted. If our typical value A is completely 
inconsistent with our known value, A n B = <(>, then Poss[A/B] = 0 
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and thus D(x) = B(x) and hence 

V is B 


We have discounted our typical infonnation when it conflicted with our knowledge- 
base. 

On the other hand if A has some consistency with B, A n B * <|>, thus 
Poss[A/B] = 1 then we get 

D(x) = B(x) a A(x) 

and hence 


V is A n B. 

Thus when our typical knowledge doesn't contradict our firm knowledge we conjunct 
these sources of knowledge. In the special case when B is unknown, B = X, then we 
get 

Vis A. 


CONCLUSION 

We have discussed the applicability of the theory of approximate reasoning 
to rule based expert systems. The novel aspects of this work concerns our 
introduction of an approach in this framework for the inclusion of complex rules and 
the ability to introduce certainty qualification into our system. 
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The paper starts with ideas of possibility qualification and certainty qualification 
for specifying the possible range of a variable whose value is ill-known. The notion 
of possibility which is used for that purpose is not the standard one in possibility 
theory, although the two notions of possibility can be related. Based on these 
considerations four distinct types of rules with different semantics involving 
gradedness and uncertainty are then introduced. The combination operations which 
appear for taking advantage of the available knowledge are all derived from the 
intended semantics of the rules. The processing of these four types of rules is studied 
in detail. Fuzzy rules modelling preference in decision processes are also discussed. 

1. INTRODUCTION 

The applications of fuzzy set and possibility theories to rule-based expert 
systems have been mainly developed along two lines in the eighties : i) the 
generalization of the certainty factor approach introduced in MYCIN (Buchanan and 
Shortliffe, 1984) by enlarging the possible operations to be used for combining the 
uncertainty coefficients ; ii) the handling of vague predicates in the expression of the 
expert rules or of the available information. The first line of research is exemplified 
by the inference system RUM (Bonissone et al., 1987) where a control layer chooses 
the triangular norm operation governing the propagation of uncertainty, or by the 
inference system MILORD (Godo et al., 1988) where the combination and 
propagation operations associated with each rule reflect the expert knowledge. The 
second trend has motivated a huge amount of literature especially for discussing the 
multiple-valued logic implication connective -» to be used in the modelling of a rule 
of the form “if X is A then Y is B” by means of a fuzzy relation R (defined by 
|J. R (x,y) = |t^(x) — > |ig(y)). The choice of the implication function has been 

investigated from an algebraic point of view by classifying the implications 
according to axiomatic properties, and from a deduction-oriented perspective by 
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requiring some prescribed kind of results for the generalized modus ponens applied to 
fuzzy "if... then..." rules (e.g. Mizumoto and Zimmermann (1982), Dubois and 
Prade (1984), Trillas and Valverde (1985), Bouchon (1987), Smets and Magrez 
(1987)). Although the available results indeed enable us to jointly choose an 
implication function and the conjunction to be used for combining the two premisses 
"X is A' " and "if X is A then Y is B" in order to obtain an expected behavior for the 
generalized modus ponens, these approaches do not really consider the intended 
semantics of the rules. See (Dubois and Prade, 1990d) and (Dubois, Lang and Prade, 
1990) for an extensive overview and a discussion of the generalized modus ponens 
and of the certainty factor approaches respectively. 

In this paper, extending recently obtained results (Dubois and Prade, 1989a, 
1990b, d), we show how the choice of the implication operation is induced by the 
type of rule we have to model in the framework of possibility theory. The approach 
which is proposed formalizes ideas which have been more empirically studied by 
Bouchon (1988), Despr6s (1989) about the role of different kinds of modifiers in the 
expression and the intended meaning of fuzzy rules and can be also somewhat related 
to recent works about possibility and necessity qualifications (Magrez and Smets, 
1989 ; Dubois and Prade, 1990a ; Fonck, 1990 ; Yager, 1990). 

We first discuss two distinct ways of specifying a possibility distribution, either 
by possibility or by certainty qualification. This can be regarded as a new approach in 
possibility theory. The consequences of the mode of qualification on the 
manipulation of the pieces of knowledge which are thus specified, are emphasized. 
The notion of possibility which is used in possibility qualification do not correspond 
to the standard notion of possibility measure in possibility theory ; the links between 
the two concepts are clarified in Section 3. Using the ideas of Section 2, Section 4 
introduces four different types of rules which are closely related to particular types of 
fuzzy truth-values (or, if we prefer, of modifiers). Section 5 discusses the behavior of 
these rules in the generalized modus ponens and when used in parallel. Section 6 is 
devoted to another kind of fuzzy rules expressing preference. 

2. TWO WAYS OF SPECIFYING A POSSIBILITY DISTRIBUTION 

2.1. The Concept of a Possibility Distribution 

A possibility distribution is a function jc x , attached to a variable x, from a so- 

called universe of discourse U to the real interval [0,1] which aims at representing our 
current view of the feasible, or epistemically possible, or admissible values of a 
single-valued variable x whose domain is U. Depending on the interpretations, 7t x (u) 

estimates the degree of ease, the degree of unsurprizingness or of expectedness, the 
degree of acceptability or of preference attached to the proposition "the value of x is 
u", i.e. x = u. The possibility distribution re x is just a way of specifying an ordering 

among the elements of U, which expresses that the closer to 1 (resp. to 0), the more 
(resp. the less) feasible, epistemically possible, or admissible, according to the 
interpretation, the value u is for x. In the following we shall use the neutral term 
"possible", saying that rc x (u) estimates the extent to which u is possible for x, when 
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it is not interesting to put forward any specific interpretation. Thus the interval [0,1] 
is just considered here as an ordinal scale where 1 stands for complete possibility and 
0 for complete impossibility. As soon as U entirely covers the domain of the variable 
x, it is natural to require that there exists at least one element u of U which can be 
considered as completely possible for x, i.e. such that Jt x (u) = 1 ; then is said to 
be normalized. 

2.2. Specifications by Means of Ordinary Subsets 

A possibility distribution is not usually specified as such, but by the 
qualification of subsets of U. Let A be an ordinary subset of U. It can be qualified 
either in toms of possibility or in terms of certainty in order to specify a possibility 
distribution re over A. Namely 

i) if A is a (completely) possible range for x, it means that Vue A, n x (u) = 1 and 
tc x remains unspecified outside of A, or equivalently, that 

”A is possible" is translated by V u 6 U, p A (u) < tc x (u) (1) 
where p A is the {0,1} -valued characteristic function of A. 

ii) if it is (completely) certain that the value of x lies in A, it means that any value 
outside A is (completely) impossible, i.e. Vue A, tt x (u) = 0 and jt x is 
unspecified over A, or if we prefer 

"A is certain" is translated by V u e U, Jt x (u) < p A (u). (2) 

Thus, let A c and A g be two ordinary subsets of U satisfying (1) and (2) 
respectively for a possibility distribution re x , then we have 

V u 6 U, p^ (u) < tc x (u) < p Ag (u) (3) 

which expresses that A c is included in the core of Jt x , i.e. {u e U, Jt x (u) = 1} while 
Ag contains the support of n x , i.e. {u e U, tc x (u) > 0). 

Let A j and A 2 be two subsets of U which both satisfy (1) for the same 
possibility distribution « x , then we see that 

"Aj is possible” and "A 2 is possible" => VugU, max(p A ^(u),p A ^(u)) < tc^u) (4) 
while if Aj and A 2 both satisfy (2), we have 

"Aj is certain" and "A 2 is certain" => V u g U, tc x (u) < min(p A ^(u),p A ^(u)). (5) 

We observe that pieces of knowledge which are simultaneously qualified in terms of 
possibility are combined by means of max operation in a union-like manner, while 
pieces of knowledge which are simultaneously qualified in terms of certainty are 
combined by means of min operation in an intersection-like manner. 

Let us consider cases of qualification where the possibility or the certainty is not 
complete but corresponds to an intermediary level a in the scale [0,1]. It leads to the 
two following generalizations of (1) and (2) 

i) the statement "A is a possible range for x at least at the degree a” will be 
understood as Vue A, n x (u) > a, which leads to 
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"A is a-possible" is translated by V u g U, min(|i A (u), a) < re x (u). (6) 
Note that for a = 1, (1) is recovered. 

ii) the statement "it is certain at least at the degree a that the value of x is in A", 
will be interpreted as any value outside A is at most possible at the 
complementary degree, namely 1 - a, i.e. Vue A, rc x (u) < 1 - a, which leads to 

"A is a-certain" is translated by V u g U, 7 t x (u) < max(p A (u), 1 - a) (7) 

Note that for a = 1, (2) is recovered. Certainty qualification was first discussed by 
Yager (1984) and Prade (1985) ; in this latter reference, (7) already appears with 
equality (i.e. the less restrictive possibility distribution compatible with (7) is 
chosen). Possibility-qualification goes back to Zadeh (1978b) and Sanchez (1978). 
Clearly (4) and (5) can be straightforwardly extended to "Aj is a j -possible" and 

to "Aj is (Xj-certain", for i = 1,2 using (6) and (7) respectively. 

2.3. Specifications by Means of Fuzzy Subsets 

We now consider the more general case where the subset A which is qualified is 
fuzzy. It is well-known that a fuzzy (sub)set A can be represented in terms of a 
collection of ordinary subsets, namely the a-cuts A a = {u g U, |i A (u) > a) of A. 
We have (Zadeh, 1971) 

Vu, JIAO 1 ) = sup aG (o,i] min(p.A a (u), a) (8) 

Then we immediately notice that, if we interpret "A is possible" (where A is fuzzy) 
as the conjunctive collection of possibility-qualified non-fuzzy statements of the form 
"A a is a-possible". Vac (0,1], we obtain, using (6) for the possibility- 
qualification and (4) for the max-combination (here extended to a sup-combination 
since the collection may be not finite) 

V u g U, sup a€ (o,i] min(HA a (u), a) < rc x (u) 
i.e. V u g U, |i A (u) < jc x (u) 

which clearly generalizes (1) to a fuzzy set A. 

From (8), by taking the complement of A, i.e. the complement to 1 of its 
membership function, we can obtain another representation formula, namely 
V u g U, |i^(u) = 1 - |i A (u) = inf ae (o,i] max(|x-y— (u), 1 - a) (9) 

where the overbar on a subset denotes the complementation and we use the identity 
(A a ) = (A)i_ a , with Bp denoting the strong |$-cut of a fuzzy set B, namely 
{u g U, M-g(u) > P) (i.e. '>' is changed into '>' in the definition of the level cut). 
Clearly (9) applies to any A and thus (9) still holds changing A into A, which gives 
V u g U, |i A (u) = inf ae (o,i] max(p A _(u), 1 - a) (10) 

Then, if we interpret "A is certain" (where A is fuzzy) as the conjunctive 
combination of certainty-qualified non-fuzzy statements of the form Ay-^ is a- 
certain. Vac (0,1], we obtain using (7) and the min-combination (5) (extended to 
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an inf-combination) 

V u e U, k x (u) < inf ae (o,i] max((i A ^_(u), 1 - a) 

i.e. V u e U, Jt x (u) < n A (u) (11) 

which clearly generalizes (2) to a fuzzy set A. The interpretation of "A is certain" 
(i.e. we are completely certain that the possible values of x are restricted by |i A ), as 

the conjunction of statements of the form "Ay^ is (at least) a-certain" is quite 
natural. Indeed we are completely certain (a = 1) that the support Aq = {u e U, 
|i A (u) >0} of A contains the value of x, while the certainty that the strong P-cut of 
A includes the value of x decreases when P increases, because the P-cut becomes 
smaller due to the nestedness property P < p => Ap 2 . Ap. (here P = 1 - a). Note 

also that "Ay^ is at least a-certain", according to (7), means that any value in 
Ay^ = ( A) a is at most possible at the degree 1 - a (or if we prefer at least 
impossible at the degree a). 

An immediate consequence of (9) and (11) is that if the fuzzy set A is both a 
completely possible and completely certain fuzzy range for the value of x in the 
above sense, then we should have 

Vue U,7t x (u) = p A (u) (12) 

i.e. the equality with which Zadeh (1978a) starts the introduction of possibility 
theory. 

We now generalize (6) and (7) to a fuzzy set A thus introducing gradedness in 
possibility and certainty qualification of fuzzy subsets. "A is possible" has been 
interpreted as "Ap is at least P-possible" V P e (0,1]. Then it is natural to interpret 
"A is a-possible" as V p > a, "Ap is at least a-possible" and V P < a, "Ap is at 

least P-possible". In other words, the smallest P-cuts with P close to 1 are only 
assigned a minimal degree of possibility equal to a (instead of P). Then using the 
max-combination (4), we get 

V u € U, max(supp> a minO^u), a), supp <a min(p A p(u), P)) < tt x (u) 
i.e. 

max(min(supp> a min^^u), P), a), supp <a min(|i A p(u), P)) < tt x (u) 
i.e. 

Vug U, min(supp 6 (o,i] min(n A p(u),p),a)<7t x (u) (since supp <a min(n A p(u),p)<a) 

i e - V u € U, min(n A (u), a) < Jt x (u) (13) 

which extends (6) to the case where A is fuzzy. Interestingly enough, (13) was already 
discussed by Zadeh (1978b) and Sanchez (1978) for possibility-qualification purposes. 
Similarly, "A is certain" has been interpreted as "Ay^p is at least P-certain", 

Vp g (0,1]. Then "A is a-certain" will be interpreted as "Ay^p is at least min(a,P) 
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certain". Then using the min-combination (5), we get (Dubois and Prade, 1990a, d) : 
V u e U, Jt x (u) < infp e (0,1] max(|i A _p(u), 1 - min(a,P)) 

i.e. 

V u e U, %(u) < max(infp e ( 0 , 1 ] max^^^), 1 - P), 1 - a) 

*' e " V u g U, Jt x (u) £ max(|i A (u), 1 - a) (14) 

which extends (7) to the case where A is fuzzy. 

Before applying the above model, especially (13) and (14), to the representation 
of different types of fuzzy rules in Section 4, it is important to clarify the 
relationship between the notion of possibility qualification introduced in this section, 
and possibility theory as developed until now (Zadeh, 1978a ; Dubois and Prade, 
1988). 

3. TWO COMPLEMENTARY NOTIONS OF POSSIBILITY 

The idea of possibility which has been used for qualification purposes in the 
preceding section is not the same as the one underlying the definition of a possibility 
measure. Indeed when "A is possible" is modelled by the inequality 

V u g U, p A (u) < 7t x (u) 

it means, when A is an ordinary subset, that 

V u g A, ji x (u) = 1 (15) 

while the measure of possibility II induced by n x , of a non-fuzzy event A is defined 
by (Zadeh, 1978a) 

11(A) = sup ue A tc x (u) (16) 

and clearly, when A has a bounded support, 

n(A) = 1 o 3 u g A, 7t x (u) = 1 (17) 

n(A) = 1 only says that the statement "x is in A” is consistent with the available 
information described by rc x . The discrepancy between (15) and (17) is obvious and is 

expressed by the difference between the logical quantifiers V and ’3’. This 
discrepancy between possibility measures and the notion of "possible" appearing in 
possibility-qualification was noticed by Zadeh (1978b), but no attempt had been made 
to define the set-function underlying possibility-qualification. 

The notion of possibility introduced in Section 2 rather relates to the following 
quantity, called "everywhere-possibility" or for short E-possibility of A 

A(A) = inf u6A Jt x (u) (18) 

which is such that 

"A is possible" in the sense of (1) <=> A(A) = 1 (19) 

Clearly the set functions II and A correspond respectively to a weak and a strong 
requirement of possibility, and indeed VA, 11(A) > A(A), or if we prefer V a, A(A) > 
a => n(A) > a. In practice it corresponds to the distinction between saying that "it 
is possible that A is true" which means that there exists at least one value u in A 
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which is completely possible for x (i.e. FKA) = 1), and saying that "A is possible” is 
short for "the range A is (completely) possible for x" whose intended meaning is 
really that all the values in A are possible for the variable x. This latter notion of 
possibility is particularly important, as advocated in this paper, for the specification 
of possibility distributions in general and more particularly of fuzzy rules. 

The notion of E-possibility seems to have been largely ignored in the fuzzy set 
literature. However its counterpart in Shafer (1976)'s evidence theory is well-known ; 
it is the commonality function Q, which, by the way, is mainly used for technical 
reasons and does not seem to have received any practical interpretation until now. 
Indeed starting with a basic probability assignment m such that Z A m(A) = 1, the 

commonality of A is defined by Q(A) = Zg^ A m(A) ; it can be easily checked that 

the following analogue of (19) holds 

Q(A) = 1 <=> V u g A, Pl({u})= 1 

where the plausibility function PI is defined by P1(C) = m (®)- This results 

from Q(A) = 1 <=> VB such that m(B) >0, B^AoVueA, VB such that 
m(B) >0,ueB«Vu€ A, Pl({u}) = 1. Moreover A(A) can be put under a form 
which looks analogous to Q(A). Indeed introducing the fuzzy set F such that |Xp = 

jt x , we have A(A) = inf ug A |J.p(u) ; hence V u e A, Mp(u) > A(A) or equivalently 
A £. and more generally V a < A(A), A £, F a . Besides $e>0, A £. F^^ +£ 
(from the definition of A(A)). Hence 

A(A) = sup{a g (0,1], F a ^ A) (with the convention sup (S’ = 0 if (S’ =0) 

A possibility distribution 7t x such that {rt x (u) g (0,1], u g U] is an ordered 

finite set M = {a^ a n ] with aj = 1 >... > a n > a n+ j = 0, is equivalent to the 

basic probability assignment (Dubois and Prade, 1982) defined by 

Vi, m(F tti ) = ttj - a i+1 

V B * F a j, m(B) = 0 

Then, using the nestedness property a < P => F a ;> Fp, it can be easily seen that 

A(A) = Q(A), where Q is defined from the function m above, i.e. the two definitions 
coincide. More generally, for A remaining non-fuzzy, it is easy to see that 

A is (at least) a-possible <=> V u g A, n x (u) > a <=> A(A) > a (20) 

which generalizes (19). The definition of the E-possibility can be extended to fuzzy 
sets still preserving the equivalence 

A is (at least) a-possible <=> V u g U, jc x (u) S min(p A (u), a) <=> A(A) > a (21) 

This is satisfied by taking 

A(A) = inf ue u |i A (u) -4 n x (u) (22) 

[ 1 ifa<b 

where a b = l is a multiple- valued logic implication connective, 

L b if a > b 

known as GOdel's implication. It is easy to see that (22) reduces to (18) when A is an 
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ordinary subset. Moreover (21) is ensured by the equivalence a — > b > c <=> b l> 
min(a,c). By contrast, a lower bound a on the extension of the possibility measure 
II (defined by (16)) to a fuzzy event A, i.e. 

11(A) = sup u£ y min(p A (u), jc x (u)) > a (23) 

is equivalent to V p < a, Il(Ap) > a, i.e. V p < a, 3 u 6 Ap, n x (u) > a (see, e.g„ 

Dubois and Prade, 1990a), which clearly departs from A(A) > a, the latter means that 
V p > a, V u e Ap, jt^u) >aandVp<a, Vue Ap, rc x (u) > p. 

We now identify to what evaluation of A is associated the certainty qualification 
presented above. "A is a-certain" is represented, in the general case where A is fuzzy, 
by (14), i.e. 

V u e U, 7t x (u) < max(p A (u), 1 - a) 

Since c < max(a, 1 - b) <=> (1 - a) (1 - c) > b, where — » denotes GOdel’s 
implication, (14) is equivalent to (Dubois and Prade, 1989a) 

X(A) = inf ue u (1 - p A (u)) -> (1 - 7t x (u)) > a (24) 
i.e. 

"A is a-certain" <=> Jf(A) > a. 

When A is an ordinary subset, (24) reduces to 

X(A) = 1 - II( A) = inf ug A (1 - 7t x (u)) (25) 

where the overbar denotes the complementation. It means that certainty qualification, 
when A is non-fuzzy is in complete agreement with the necessity measure based on 
Jt x and defined by duality with respect to the possibility measure. However when A 

is fuzzy, the duality relation Jf(A) = 1 - 11(A), where II is extended to fuzzy events 
by (23), is no longer satisfied. When A is fuzzy, since (14) is equivalent to "Ay^p is 

at least min(a,P)-certain", we get 

^(A) > a <=> V p < 1, <4^(Ap) £ min(a, 1 - P) (26) 
which expresses the relation between certainty-qualification and the measure of 
necessity of the P-cuts of A. See (Dubois and Prade, 1990a) for a discussion about 
certainty-qualification in possibilistic logic with fuzzy predicates (which is in full 
agreement with Jf defined by (24)) and (Dubois and Prade, 1989a, 1990d) for the 
distinct uses of <#”(A) and 1 - II( A), the former in certainty-qualification, the latter 
in fuzzy pattern matching. More particularly, as already said in (Dubois and Prade, 
1989a), the statement "A is certain" may either mean that we are certain that the 
possible values of x are inside A, i.e. n x < p A (which is captured by Jf(A) = 1), or 

that we are certain that the value of x is among the elements of U which completely 
belong to A, i.e. supportre x = {u e U, 7t x (u) > 0} £. core(A) = {u € U, (i A (u) = 1) 

which is captured by 1 - 11(A) = 1). This latter interpretation which is more 
demanding is clearly related to the fuzzy filtering of fuzzily-known objects ; see 
(Dubois, Prade and Testemale, 1988). 
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4. REPRESENTATION OF DIFFERENT KINDS OF FUZZY RULES 

We now apply the results of Section 2 on possibility and certainty qualifications 
to the specification of fuzzy rules relating a variable x ranging on U to a variable y 
ranging on V. 

Possibility rules : A first kind of fuzzy rule corresponds to statements of the form 
"the more x is A, the more possible B is a range for y”. If we interpret this rule as 
”Vu, if x = u, B is a range for x is at least p A (u)-possible", a straightforward 

application of (13), yields the following constraint on the conditional possibility 
distribution re y | x ( • ,u) representing the rule when x = u 

V u 6 U, V v e V, min(p A (u), p B (v)) < Jt yix (v,u). (27) 

Certainty rules : A second kind of fuzzy rule corresponds to statements of the form 
"the more x is A, the more certain y lies in B". Interpreting the rule as " Vu, if x = u, 
y lies in B is at least |i A (u)-certain", by application of (14) we get the following 

constraint for the conditional possibility distribution modelling the rule 

V u e U, V v e V, Jty| x (v,u) < max(p B (v), 1 - p A (u)). (28) 


In the particular case where A is an ordinary subset and where we know that, if x is 
in A, B is both a possible and a certain range for y, (27) and (28) yield 

| V u e A, jt y|x (v,u) = p B (v) (29) 

[ V u g A, 7t y | x (v,u) is completely unspecified. 

This corresponds to the usual modelling of a fuzzy rule with a non-fuzzy condition 
part. Note that B may be any kind of fuzzy set in (27), (28) and then in (29). Thus B 
may itself includes some uncertainty ; for instance the membership function of B 
may be of the form |j. B = max(p B *, 1 - P) in order to express that when x is A, B* 

is the (fuzzy) range of y with a certainty P (any value outside the support of B* 
remains a possible value for y with a degree equal to 1 - P) ; we may even have an 
unnormalized possibility distribution which can be put under the form p B = 


min(|i B *,a) if the possibility that y takes its value in V is bounded from above by a 

(i.e. there is a possibility 1 - a that y has no value in V, when x takes its value in 
A). 

Gradual rules : This third kind of fuzzy rule has been discussed in (Dubois and Prade, 
1989a, 1990b, d). Gradual rules correspond to statements of the form "the more x is 
A, the more y is B". Statements involving "the less" in place of "the more" are 
easily obtained by changing A or B into their complements A and B due to the 
equivalence between "the more x is A" and "the less x is A" (with p A = 1 - p A ). 


More precisely, the intended meaning of a gradual rule can be understood in the 
following way : "the greater the degree of membership of the value of x to the fuzzy 
set A and the more the value of y is considered to be in relation (in the sense of the 
rule) with the value of x, the greater the degree of membership to B should be for this 
value of y", i.e. 
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V u e U, min(ji A (u), JC y | x (v,u)) < n B (v) (30) 

or, using the equivalence min(a,t) < b « t < a -> b where -» denotes GOdel's 
implication, 

[1 if |Ia(u) < |Xd(v) 

V u € U, 7C| X (V,U) < n A (u) -> P B (V) = x (31) 

y [ p B (v) if p A (u) > Pg(v) 

(31) can be equivalently written 

Vue U, Jty| x (v,u) < max 0 x |j -lA ( u ) f i]0 A B( v ))» M-B(v)) = % a ( u ),1]uT^ b(v)) (32) 

where i] is the characteristic function of the interval [p A (u),l] and where T 

is the fuzzy set of [0,1], defined by V t e [0,1], |ij(t) = t, which models the fuzzy 

truth-value 'true' in fuzzy logic (Zadeh, 1978b). If we remember that "x is A 
is x-true", where t is a fuzzy truth-value modelled by the fuzzy set x of [0,1], is 
represented by the possibility distribution (Zadeh, 1978b) 

Vue U,n x (u) = M t (lt A (u)) (33) 

(note that x = T yields the basic assignment (12)), we can interpret the meaning of 
gradual rules in the following way using (32) : V u e U, if x = u then y is B is at 
least |X A (u)-true. The membership function of the fuzzy truth-value 'at least a-true' is 

pictured in Figure l.a. As it can be seen, it is not a crisp "at least a-true" (which 
would correspond to the ordinary subset [a,l]), but a fuzzy one in agreement with 
truth-qualification in the sense of (33), indeed "at least 1-true" corresponds to the 
fuzzy truth-value T. 

If we are only looking for the crisp possibility distributions Jty| x (i.e. the {0,1}- 

valued ones) which satisfy (30), because we assume that it is a crisp relation between 
y and x which underlies the rule "the more x is A, the more y is B", then we obtain 
the constraint 

[ 1 if |i A (u) < |Xr,(v) 

(34) 

which expresses that the rule is now viewed as meaning : VugU, if x = u then y is B 
is at least |X A (u)-true, where the truth-qualification is understood in a crisp sense. The 
reader is referred to (Dubois and Prade, 1990b) for a discussion of this kind of rule. 

A fourth type of fuzzy rules : The inequality (30) looks like (27) when exchanging 
p B (v) and jty| x (v,u), while (31), which is equivalent to. (30), is analogue to (28) in 

the sense that in both cases 7 ty| x is bounded from above by a multiple-valued logic 

implication function (in (28) it is Dienes' implication : a -» b = max(l - a, b) which 
appears). It leads to consider the inequality constraint obtained from (28) by 
exchanging |X B (v) and Jty| x (v,u), i.e. 

V u g U, V v e V, max(jc y | x (v,u), 1 - n A (u)) > |X B (v) (35) 

This corresponds to a fourth kind of fuzzy rules, of which we now investigate the 
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intended meaning. (35) is perhaps more easily understood by taking the complement 
to 1 of each side of the inequality, i.e. 

VueU, VveV, 1- n B (v) > min(p A (u), 1 - n y | x (v,u)) 

which can be interpreted as "the more x is A and the less y is related to x, the less y 
is B", which corresponds to a new type of gradual rule. Using the 
equivalence min(a, l-t)<l-b<=> l-t<a-»(l-b)<=> t > 1 - (a — » (1 - b)), 
where -» is Gddel's implication, we can still write (35) under the form 


VueU, VveV, n y | x (v,u)> 


0 if p A (u) + |X B (v) < 1 
p B (v) if p A (u) + p B (v) > 1 
= min(p (1 _^ A(u)>1] at B (v)), p B (v)) 

= ^(l-M. A (u), 1 ]nT^B( v )) 


(36) 


Unsurprisingly, the lower bound of Jt y | x (v,u) which is obtained, is a multiple-valued 
logic conjunction function of p A (u) and p. B (v) (indeed f(a,b) = 0ifa + b<l and 

f(a,b) = b otherwise, is such that f(0,0) = f(0,l) = f(l,0) = 0 and f(l,l) = 1). From 
(36) we see that this type of gradual rules can be interpreted in the following way, 
using (33) : V u e U, if x = u then y is B is at least (1 - |X A (u))-true. The 

membership function of the corresponding fuzzy truth-value "at least (1 - a)-true” is 




a. "at least a-true" b. "at least a-certainly true" 

(core point of view) 



(support point of view) 



Figure 1 : Four basic types of fuzzv truth-values 
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pictured in Figure l.d. As it can be observed by comparing Figures l.a and l.d, they 
correspond to two points of view in (fuzzy) truth-qualification of level a, one 
insisting on the complete possibility of degrees of truth greater than a (core point of 
view), the other insisting on the complete impossibility of degrees of truth less or 
equal to 1 - a (support point of view). 

Figures l.b and l.c picture the fuzzy truth-values "at least a-certainly true" and 
"at least a-possibly true", whose respective membership functions are 
maxOj/j- , 1 - a) and min(|*pa). It can be seen on Figure 1 and formally checked 

that the four fuzzy truth-values we have introduced satisfy the two duality relations 
'at least a-certainly true’ = compoant ('at least a-possibly true’) (37) 

'at least a-true (core p. of v.)’ = compoant Cat least 1-a-true (support p. of v.)) (38) 

where comp and ant are two transformations reflecting the ideas of 
complementation and antonymy respectively, and defined by com p(f(t)) = 1 - f(t) and 
ant(f(t)) = f(l - 1), V te [0,1] and f ranging in [0,1]. Note that compoant = 
antocomp. Note that when there are only two degrees of truth, 0 (false) and 1 (true), 
"at least a-certainly true” corresponds to the possibility distribution Tc(true) = 1, 
Tt(false) = 1 - a and "at least a-possibly true" to n(false) = 0 and 7 t(true) = a, while 
the two other (fuzzy) truth-values make no sense. Dually when there are only two 
degrees of possibility, 0 (complete impossibility) and 1 (complete possibility), then 
the representations of "at least a-true" and "at least (1 - a)-true" respectively coincide 
with the ordinary subsets [a,l] and (1 - a, 1]. 

The four fuzzy truth-values pictured in Figure 1 (with a = |i^(u)) can be viewed 

as representing modifiers <p (in the sense of Zadeh (1972)) which modify the fuzzy set 
B into B* such that p.g* = <pQig), in order to specify the subset of interest for y in 

the various rules when x = u. For summarizing, in the case of 

- possibility rules, the possibility distribution rty| x ( • ,u) is bounded from below by 
(pOtg) with <p(t) = min(|i A (u), t), i.e. B is truncated up to the height a = (i^(u) ; 

- certainty rules, the possibility distribution Jty| x ( • ,u) is bounded from above by 
(pOxg) with <p(t) = max(t, 1 - (i^(u)), i.e. B is drowned in a level of indetermination 
1 -a ; 

- gradual rules (core point of view), the possibility distribution 7ty| x ( • ,u) is bounded 
from above by (p(pg) with <p(t) = |t^(u) — » t (where -» denotes GOdel's 
implication), i.e. the core of B is enlarged ; 

- gradual rules (support point of view), the possibility distribution Jty| x ( • ,u) is 
bounded from below by cp(|4.g) with <p(t) = 0 if p A (u) + t < 1 and <p(t) = t 
otherwise, i.e. the support of B is diminished, truncated. 

N.B. : The similarity between (27) and (30) suggests that "the more x is A, the more 
y is B", where y is in relation R with x, can be understood as meaning that V u € U, 
B represents the statement ”R(u) is a range for y which is at least possible at the 
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degree |a.^(u)", where R(u) is the fuzzy set of elements in V in relation with u. 

5. INFERENCE WITH FUZZY RULES 
5.1. Parallel Rules with a Precise Input 

Depending on the kind of constraint induced by their representation (the 
possibility distribution is bounded from below or from above) the combination of 
several rules in parallel of the same type will be performed differently. Namely for 
certainty rules and for gradual rules focusing on cores, described by means of pairs 
(Aj,B-), i = l,n we have 

V i = l,n, V u 6 U, V v 6 V, 7t y | x (v,u) < |XAj(u) PBi(v) 
where e> denotes Gftdel's or Dienes' implication ; then by combination we get 

VugU.VvgV, Jt y | x (v,u) < min i=ln (pAi(u) PBj(v)) (39) 
while with possibility rules and gradual rules focusing on supports, we have 

V i = l,n, V u € U, V v g V, Jt y | x (v,u) > PAi(u) & PBi(v) 

where & denotes the min conjunction or the non-symmetrical one introduced above ; 
which yields 

V u g U, V V G V, Jty| x (v,u) > max i=ln (|XAi(u) & PBi(v)) (40) 

The existence of two models of combination of systems of fuzzy rules has been 
pointed out by several authors including Baldwin and Pilsworth (1979), Tanaka et al. 
(1982), Di Nola et al. (1985), when considering special cases of implication 
functions &> in contrast with the min-conjunction for combining the fuzzy sets A| 
andBj. 

The Figures 2.a and 2.b illustrate the behaviour of the four types of rules in case 
of two parallel rules relating A j and B j on the one hand and and B 2 on the other 
hand when the value x is precisely known, i.e. n x = p.^< with A' = (uq). In Figure 
2.a, x = uq perfectly satisfies the requirements modelled by and A 2 , i.e., 
MAj^) = I a A 2 ^ u cP = * an< ^ we obtain as a conclusion for y, with 7ty = pg. 

V v g V, PB'(v) 5 niin(pBi(v),PB 2 ( v )) (conjunction of the conclusions) 
for certainty rules and for gradual rules (focusing on cores) due to (39), and 

V v g V, PB'( V ) ^ max(pg 1 (v),pB 2 ( v )) (disjunction of the conclusions) 
for possibility rules and for gradual rules (focusing on supports) due to (40). The fact 
that we obtain B' Bj u B 2 when {uq} £ Aj n A 2 for possibility rules, for 

instance, should not be surprizing ; indeed we have to remember that each rule 
expresses that "the more x is A ■, the greater the level of possibility of Bj as a 

possible range for y" for i = 1,2, then since y may lie in both Bj andB 2 , B 1 u B 2 
should be a possible range for y. In Figure 2.b, we have M-Aj^ = 1 a 8 ain » but now 
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HAjO*)) = a < 1- The difference between certainty rules and gradual rules focusing 

on cores appears clearly : for certainty rules, the intersection of B 2 with B j is 
pervaded with a level of uncertainty 1 - a (i.e. min(pg^, max(Hg^, 1 - a))), while 

the upper bound of B' for the gradual rules stays between B j andB 2 » overlapping a 
little more B 2 . Similarly the difference between possibility rules and gradual rules 
focusing on supports also appears ; for the former we obtain p.g. > 
max(|ig^,min((ig^,a)) which expresses that the values in B^ are regarded as a priori 

.4 A 1 *2 


A = {u 0 } 
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less possible than the ones in B 2 ; for the latter some values in the support of B j are 
considered as potentially impossible. 

5.2. Generalized Modus Ponens with One Rule 


In section 4, when studying the representation of the four types of rules 
considered in the paper, we have described the response B’ of a rule to a precise input 



Gradual rules (focusing on cores) Certainty rules 



A' = (uq) under the form p.g. = <p(pg) where cp represents a particular fuzzy truth 

value which acts as a modifier. We now consider the more general situation of the 
generalized modus ponens (Zadeh, 1979), namely 
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x is A' 

crule relating x is A with y is B> 
y is B' 

As usual, and in agreement with (12), "x is A’ " will be understood as 

V u e U, jc x (u) = n A '(u) 

while the rule, depending on the case, is represented 

by V u e U, V v € V, 7 t y | x (v,u) < |i A (u) ©> p B (v) (case I) 

or by V u g U, V v g V, jty| x (v,u) > |i A (u) & |ig(v) (case II) 

Applying the combination/projection principle (Zadeh, 1979 ; see Dubois and Prade, 
1990d for a discussion), i.e. here 

V v g V, jt y (v) = sup u€U min(7t x (u), Jty| x (v,u)) (41) 

Thus we get, with Vv, pgt(v) = jty(v) 

V v g V, Mg-(v) < sup UG u min(|i A -(u), p A (u) ©» |J.g(v)) (case I) 

V v g V, Pg.(v) > sup„ € u min(p A (u), |i A (u) & p B (v)) (case II) 

Let us first consider the two kinds of rules belonging to case I. 

For certainty rules : we obtain 

V v g V, Pg-(v) < sup UGU min(ji A .(u), max(l - |i A (u), p B (v)) 

= max(p.g(v), 1 - N(A ; A 1 )) (42) 

provided that A' is normalized and where 

N(A ; A’) = inf UGU max(tt A (u), 1 - |i A <u)) 
is the dual of the possibility measure II(A ; A 1 ) of the fuzzy event A defined by (23) 
(with tc x = p A >), and N(A ; A 1 ) is thus equal to 1 - FI( A ; A') and plays a basic role 

in fuzzy pattern matching as briefly recalled at the end of section 3. The inequality 
(42) expresses the following. Our lack of certainty that all the values restricted by 
|X A . are (highly) compatible with the requirement modelled by |X A , induces a 

possibility at most equal to 1 - N(A ; A') that the value of y is outside the support 
of B. In other words (42) means that it is N(A ; A 1 ) certain that y is restricted by B. 
For gradual rules focusing on cores, we have 

V v g V, Mg'(v) < sup UGU min(n A ,(u), 

From which it can be concluded that the least upper bounds derivable from the above 
inequality are given by (Dubois and Prade, 1988) : 

• M-g'(v) < 1, V v g {v g V, p B (v) > inf UGU (|i A (u) I p A (u) = 1) ) 

• Hg'(v) < sup UGU (fX A .(u) I p A (u) = 0} = I"I(support(A) ; A’), 

V v g {v g V, p.g(v) = 0} 

This shows that when A' is no longer a singleton, the enlarging effect of the core of 
B may be increased and a non-zero possibility n(support(A) ; A') may be obtained 
for values outside the support of B, for the possibility distribution restricting y. The 
level of possibility IT( support A) ; A') acknowledges the fact that some possible 
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values of x (in A') are not consistent at all with A. 

We now consider the rules belonging to case n. 

For gradual rules focusing on support, we have 

V v g V, p B ,(v) > sup u€U min(p A ,(u), 

Noticeable greatest lower bounds derivable from the above inequality are 

• p B .(v) > 0, V v g {v g V, p B (v) < 1 - sup{p A (u) I p A -(u) > 0} } 

• p B (v) > sup ueU (p A (u) I p A (u) > 0} = ri(support(A) ; AT), 

V v g {vg V, p B (v) = 1} 

As it can be seen, the truncation effect of the support of B may decrease, while the 
height of the possibility distribution attached to y, equal to IT(support(A) ; A'), may 
be less than 1 (without being 0), when A* is no longer a singleton. When some 
values compatible with A' do not belong at all to A, the lower bound on the level of 
possibility for values in B to be the value of y decreases. 

For possibility rules, we have 

Vvg V, p B -(v) > sup U g u min(p A (u),p A (u),p B (v)) = min(p B (v),II(A ; A 1 )) (43) 

It expresses that as soon as there is no value in A' fully consistent with A, B is only 
considered as an a-possible range for y (in the sense of (13)), with a = ri(A ; A*). 

5.3 - Parallel Rules with a Fuzzy Input 

Lastly, we consider the general problem of the generalized modus ponens in the 
face of a collection of n rules of the same type , i.e. the pattern 
x is A’ 

crule relating x is Aj with y is Bj> i = l,n 
y is B’ 

At the theoretical level, the solution is straightforward, namely 

V V g V, p B .(v) < sup ueU min(p A .(u), min i=ln p^u) e> p B .(v)) (case I) (44) 

V v g V, p fi .(v) > sup ueU min(p A (u), max i=ln p^u) & p B .(v)) (case II) (45) 

However at the practical level, the computation of these expressions raises problems 
for some types of rules. There is no difficult problem for (45). Indeed it can be 
checked that (45) is equivalent to 

V v g V, p B .(v) > max i=l n sup UGU min(p A .(u), p^u) & p B .(v)) 
i.e., for possibility rules we get 

V v g V, p R .(v) > max i=l n min(p B .(v), n(A| ; A’)) (46) 

For certainty rules the following upper bound which can be derived from (44) 
(assuming that A’ is normalized) is no longer the best one, 

V v g V, p B *(v) < min i=l n max(p B .(v), 1 - N(Aj ; A’)) (47) 

Indeed assume n = 2, and that Aj, A 2 , A' are distinct non-fuzzy subsets, and 
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A’ = Aj u A 2 , with A* t£ Aj, A* £ A 2 , then N(Aj ; A') = 0, i = 1,2, and we get the 
trivial result Vv, p B .(v) < 1 by (47), although using (44) it would be possible to 
conclude that Vv, |ig.(v) < max(p,g^(v),pg^(v)), which is a satisfying result. It 

emphasizes the fact that the rules should be jointly processed as in (44) in order to 
get the more accurate result : it is not the case in (47) where the conclusions obtained 
from x is A’ and from each rule i are combined. Note that (46) and (47) are 
respectively a weighted max and a weighted min combination ; when Vi, 
U(\ ; A') = 1, N(A| ; A 1 ) = 1, (46) and (47) yield the union and the intersection of 

the B j's respectively, see Dubois, Prade and Testemale (1988) for instance, for more 

details. The case of implication-based gradual rules raises similar problems. The 
processing of a collection of parallel gradual rules (focusing on cores) has been 
investigated in (Dubois, Martin-Clouaire and Prade, 1988) and in Martin-Clouaire 
(1988) to which the reader is referred. It is possible from the collection of rules to 
build a new rule which, when applied to A', yields the optimal result, i.e. the value 
of the upper bound expressed by (44) ; this new rule summarizes the knowledge 
useful in the collection of rules for dealing with the fact "x is A* ". 

Generally speaking, we have to define the consistency, the non-redundancy of 
the set of fuzzy rules, and this leads to put some constraints on the coverage of U by 
the Aj’s (see the first of the two above-mentioned references for definitions of these 

notions). Clearly further research is needed for a complete investigation of the 
practical processing of a collection of rules of a given type, also including the 
problem of compound condition parts in the rules which have not been considered 
here. Figure 2.c exhibits different behaviors of the four types of rules in the 
case n = 2 where A' = A j n A 2 . We notice that we obtain B' £. Bj n B 2 with 

gradual rules focusing on cores, which confirms the "interpolation" flavor of this 
behavior : if A' is between Aj and A 2 (in the sense of the intersection), then the 

possible values of y are restricted by a fuzzy set in between B j and B 2 ; a level of 
uncertainty equal to 0.5 appears for certainty rules, this is due to the fact that with 
continuous membership functions N(A ; A) = 0.5 as soon as A is a fuzzy set 
(when A* = Aj n A 2 , we are not completely sure x that belongs to the core 

of A 2 and to the core of A 2 . For the two types of rules corresponding to 
case II, we obtain Bj u B 2 £. B* as expected (since here II(Aj ; A’) = 1 and 
supfp^/u) I p A .(u) > 0} = 1, for i = 1,2). 

As a final remark in this section, note that we may think of using two types of 
rules simultaneously, especially for certainty and possibility rules, since it can be 
checked that the two corresponding inequalities constraining tty are consistent, 

namely using both (42) and (43) we get 

V v e V, min(|i B (v) ; I1(A ; A’)) <= Jt y (v) £ max(p B (v), 1 - N(A ; A')) (48) 

It corresponds to the case of a piece of knowledge saying both that "the more x is A, 
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Gradual rules (focusing on cotes) Certainty rules 



Figure 2.c : Two rules in parallel and a fuzzv input 
the more possible B as a range for y and the more certain y is in B". 


6 ■ RULES EXPRESSING PREFERENCE 

So far, we have focused on "reasoning rules" whose aim is to describe the 
relationship between relevant parameters in some problem, e.g. a diagnostic problem. 
There is another class of "if. . . then. . . rules" which express preference and whose aim 
is to help in making choices rather than to make implicit knowledge explicit. Their 
format is in the crisp case : 

if <situation> then <decision>. 

<situation> is the description of the states of the world where some decision can be 
recommended. <decision> can be some action to perform, some procedure to trigger 
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or even an assignment statement (like "choose for y the value vq"). When there are 

many possible states of the world, it is difficult to partition them into rigid classes 
where specific decisions can be totally recommended. As a result, the description of 
the states of the world where a decision is relevant is often fuzzy, because decisions 
can be more or less recommended. Hence the "if' part contains fuzzy predicates, and 
the preference rule means 

"the more the state of the world corresponds to <situation>, 
the more recommended is <decision>" 

Let x be a vector that contains the precise description of the world, S be a fuzzy set 
of values of x corresponding to the description of a range of situations, and u(d) the 
preference degree for decision d. By definition, u(d) = 0 means that d should be 
rejected, u(d) = 1 means that d can be applied without any doubt. The fuzzy preference 
ride just indicates that u(d) can be quantified by |ig (x). 

In fuzzy control (e.g. Mamdani (1977), Sugeno (1985)), fuzzy rules can be 
viewed as preference rules of the form 

if x is Aj then (tty = p.g.) 

where Bj is viewed as a fuzzy set of recommended (possible) actions, and an action is 

the selection of a value for y, the control parameter. Hence, it is a more general kind 
of preference rules than the one when only one decision is involved in the conclusion 
part Instead of proposing a single decision in situation Aj, a weighted ordered set is 

proposed, as described by tty The preference uj(d) of assignment y = d, in the 
presence of x = xq can be evaluated as a function of pg(d) and P a.(xq)» for a single 

rule i. Among natural conditions to be fulfilled is that uj(d) < min(pg.(d) 4 J.^(xQ)), 

which claims that PaXxq) stan ds as a upper bound on the degree of preference for y = 

d, induced by rule i. When the equality is taken for granted, we get the fuzzy control 
approach. 

Usually a preference rule does not stand alone. The set of states of the world is 
partitioned into a family of situations, where in each situation, a decision is 
recommended. It corresponds to a decision table of the form 


else 

else... 

else 

else 


if situation 1> then <decision 1> 
if situation 2> then <decision 2> 

if situation n> then <decision n> 
<decision n + 1> 


where <decision n + 1> may suggest to refrain from deciding in the case of a n + 1th 
situation that is defined by complementarity. If <decision i> corresponds to a single 
decision, then, when situation i> is fuzzy, the output is a fuzzy set of recommended 
decisions {dj, Uj(dj)), i=l,n+l). This is what happens in the OPAL system for 

instance (Bensana et al., 1988) where a decision table corresponds, to a priority rule 



65 


in a job shop, a decision is of the form "operation Oj precedes operation O 2 ", and 
d n+ j recommends not to sequence the operations for the moment. 

In fuzzy control, any elementary decision d receives an evaluation from several 
preference rules, since the Bj's may overlap. Hence the preference weight for d, u(d), 

is a function of u-(d), i = l,n (since there is no refraining from decision in fuzzy 
control, i.e. d n+1 does not exist). A natural condition in a decision table is that a 

decision is selected as soon as one rule recommending the decision in triggered. 
Hence we should have 

u(d) > max i=1)n uj(d) (49) 

The case of equality, again, corresponds to the choice of fuzzy control people. Strictly 
speaking the above scheme works provided that the actual situation is precisely 
described, i.e. x = xq. But, since the behavior of fuzzy decision rules is based on 

fuzzy pattern matching between the description of the actual situation and the 
prototypical situations in the fuzzy rules, extension to the case of ill-described inputs 
is easy to envisage. If what is known about x is that x is A' where A' is a fuzzy set, 
u(d) should become a fuzzy number induced by the degrees of compatibility between 
A' and Aj (Zadeh, 1978b), i.e. (i^(A'), defined by extending the function to the 

fuzzy value A'. Hence we should replace max in (49) by the extended max in fuzzy 
arithmetic (e.g. Dubois and Prade, 1988). 

In fuzzy pattern matching (Dubois, Prade and Testemale, 1988), we often use the 
degrees of possibility II(Aj ; A) and necessity N(A j ; A'), instead of the fuzzy truth- 
value p.^(A'). Letting an equality stand in (49), and approximating p^(A') by the 

interval [N(Aj ; A'),II(Aj ; A*)], we get the following results 

min(N(Aj ; A"), p B .(d)) < Uj(d) < mintfKAj ; A'), p B .(d)) 
and 1 1 

max i=l,n min (N( A i ; A'), p B .(d)) < u(d) < max i=:1 n minCTKAj ; A’), p B .(d)) 

Note that in the fuzzy control literature only the upper bound of u(d) is adopted, 
while the degree of necessity N(Aj ; A") should be used to describe imprecision due to 

a fuzzy input. Also it is clear from above that preference rules do not behave like 
reasoning rules. Particularly, the inequality Uj(d) < min(ji^.(xQ),p B (d)) is not in 

accordance with the possibility rules in the previous sections since the semantics of 
the latter leads to the opposite inequality. 

A last issue with preference rules is that there may be more than one decision 
tables involved in the selection of a decision d. Indeed several points of view, 
objectives, etc... may be simultaneously present, and there is some strategy to adopt 
in the presence of conflicting decision tables. This problem is absent from the fuzzy 
control literature where a fuzzy controller generally involves one decision table only. 
However, in planning, decisions may be motivated by several conflicting criteria. 
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Again this is what happens in the OPAL system (Bensana et al., 1988) for instance. 
The problem of cooperation between decision tables has been modelled in terms of 
social choice (see Bel et al., 1989) and a software architecture for implementing fuzzy 
decision tables and cooperation strategies has been devised (Dubois, Koning and Bel, 
1989). It is based on a social choice interpretation of fuzzy set aggregation 
connectives that is described elsewhere (Dubois and Koning, 1989). 

The problem of preference rules and their implementation in rule-based systems 
is certainly one of the important topics in Artificial Intelligence, for the forthcoming 
years, as witnessed by some current activity in this area, from the standpoint of 
utility theory (Keeney, 1988 ; Klein and Shortliffe, 1990), or cognitive psychology 
(Pinson, 1987). 

7 - CONCLUDING REMARKS 

The semantic contents of rules in fuzzy expert systems has received little 
attention until now in spite of the enormous quantity of existing literature about 
approximate reasoning and fuzzy controllers. The paper has tried to formally derive 
different kinds of fuzzy rules based on very simple semantical considerations. Four 
types of rules have emerged corresponding to very standard alterations of a possibility 
distribution : enlarging its core, shrinking its support, truncating its height or, 
drowning it in a uniform level of uncertainty. The paper has also pointed out that 
fuzzy decision rules do not behave like fuzzy rules describing relationships. 

Besides, rule-based expert systems have always been associated with an efficient 
local computation strategy where a partial conclusion obtained from a (compound) 
fact and a rule have to be combined with other conclusions pertaining to the same 
matter and derived from other facts and rules. This kind of strategy can be especially 
dangerous in presence of vague and uncertain pieces of knowledge since it may yield 
conclusions which are not as accurate as it can be expected from the available 
knowledge. Such conclusions may be even incorrect (see Pearl (1988), Heckerman 
and Horvitz (1988), Dubois and Prade (1989b) for instance). This is due to the fact 
that each rule, each variable to evaluate, cannot always be considered independently in 
the evaluation process. A possibilistic hypergraph technique coping with this 
problem has been recently developed by Kruse and Schwecke (1990), and by Dubois 
and Prade (1990c). 
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1 Introduction 

Fuzzy Set Theory, introduced by Zadeh in 1965 [77], has been the subject 
of much controversy and debate. In recent years, it has found many 
applications in a variety of fields. Among the most successful applications 
of this theory has been the area of Fuzzy Logic Control (FLC) initiated 
by the work of Mamdani and Assilian [36]. FLC has had considerable 
success in Japan, where many commercial products using this technology, 
have been built. 

In this paper, we will review the basic architecture of fuzzy logic con- 
trollers and discuss why this technology often provides controllers with 
performance similar to the performance of an expert human operator for 
ill-defined and complex systems. In section 2, an introductory survey of 
the basics of fuzzy set theory is presented. Next, the basic architecture 
of a FLC is described, followed by a brief review of the application of 
this theory. Finally, we discuss how a fuzzy logic based control system 
can learn from experience to fine-tune its performance. 

2 Fuzzy sets and Fuzzy logic: The basis for 
Fuzzy Control 

A fuzzy set is an extension of a crisp set. Crisp sets only allow full 
membership or no membership at all, where fuzzy sets allow partial 
membership. In other words, an element may partially belong to a set. 
In a crisp set, the membership or non-membership of an element x in set 
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Figure 1: Examples of Fuzzy membership functions 


A is described by a characteristic function ^(aj), where: 


Pa(x) 


( 1 if X e A 
y 0 if x £ A. 


Fuzzy set theory extends this concept by defining partial memberships 
which can take values ranging from 0 to 1: 


PA : X -► [ 0 , 1 ] 


where X refers to the universal set defined in a specific problem. If this 
universal set is countable and finite, then a fuzzy set A in this universe 
can be defined by listing each member and its degree of membership in 
the set A: 

n 

A = Y J PA{Xi)IXi- 

»=1 

Similarly, if X is continuous, then a fuzzy set A can be defined by 


A = 



Note that in the above definitions, “/” does not refer to a division and 
is used as a notation to separate the membership of an element from the 
element itself. For example, in A — .2/elementl+.6/element2, elementl 
has membership value of .2 and element2 has a membership value of .6 
in the fuzzy set A. As another example, the linguistic term Positive 
as shown in Figure 1 may be defined to ta Ice the following membership 
function: 


^positive 


1 if X > 4 

^ if 1 > * < 4 
0 otherwise. 


The support of a fuzzy set A in the universal set X is a crisp set that 
contains all the elements of X which have degree of membership greater 
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than zero 1 . In the above example, the support set includes all the real 
numbers for which p(x) > 0. 

The a-cut of a fuzzy set A is defined as the crisp set of all the elements 
of the universe X which have memberships in A greater than or equal 
to a, where 

A a = {x€ X\n A (x) > a}. 

For example, if the fuzzy set A is described by its membership function: 
A = {.2/2 + .4/3 + .6/4 + .8/5 + 1/6} 
and a = .3 then the a-cut of A is 

-4.3 = {3,4, 5,6}. 

The height of a fuzzy set is defined as the highest membership value of 
the set. If height(A) = 1, then set A is called a normalized fuzzy set. 

2.1 Fuzzy Set Operations 

Assuming that A and B are two fuzzy sets with membership functions 
of fiA and fiB, then the following operations can be defined on these sets. 
The complement of a fuzzy set A is a fuzzy set A with a membership 
function 

PA = 1 - M*)- 

The Union of A and B is a fuzzy set with the following membership 
function 

Haub = max{nA,fiB}- 
The Intersection of A and B is a fuzzy set 

PAnB = min{^,^B}- 

By definition, Concentration is a unary operation which, when ap- 
plied to a fuzzy set A, results in a fuzzy subset of A in such a way that 
reduction in higher grades of membership is much less than the reduction 
in lower grades of membership. In other words, by concentrating a fuzzy 
set, members with low grades of membership will have even lower grades 
of memberships and hence the fuzzy set becomes more concentrated. A 
common concentration operator is to square the membership function: 

PCON(A)(*) = Pa( x ) 

1 Mabuchi [35] has used this concept in comparing fuzzy subsets. 
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Figure 2: (a)- Monotonic, (b)- Triangular, (c)- Trapezoidal, (d)- Bell- 
shaped membership functions 

and a typical concentration operator is the term Very which is also a 
Linguistic Hedge [78]. For example, the result of applying the operator 
Very on a fuzzy label Small is a new fuzzy label Very Small . 

The Dilution operator is the converse of the concentration operator 
described above: 

Pdil(A)(x) = \Jpa{x)- 

3 The Basic Architecture Of FLC 

Different methods for developing fuzzy logic controllers have been sug- 
gested over the past 15 years. In the design of a fuzzy controller, one 
must identify the main control parameters and determine a term set 
which is at the right level of granularity for describing the values of each 
linguistic variable 2 . For example, a term set including linguistic values 
such as { Small f Medium , Large} may not be satisfactory in some do- 
mains, and may instead require the use of a five term set such as { Very 
Small f Smallf Medium f Large f and Very Large}. 

Different type of fuzzy membership functions have been used in fuzzy 
logic control. However, four types are most common. The first type 
assumes monotonic membership function such as those shown in Figure 
2(a). This type is simple and has been used in studies such as [8, 64]. 
Other types using triangular, trapezoidal, and bell-shaped functions have 
also been used as shown in Figure 2(b), (c), and (d) respectively. 


2 A linguistic variable is a variable which can only take linguistic values. 
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Figure 3: A simple architecture of a fuzzy logic controller 

The selection of the types of fuzzy variable directly affects the type 
of reasoning to be performed by the rules using these variables. This is 
described later in 3.3. After the values of the main control parameters 
are determined, a knowledge base is developed using the above control 
variables and the values that they may take. If the knowledge base is a 
rule base, more than one rule may fire requiring the selection of a conflict 
resolution method for decision making, as will be described later. 

Figure 3 illustrates a simple architecture for a fuzzy logic controller. 
The system dynamics of the plant in this architecture is measured by a 
set of sensors. This architecture consists of four modules whose functions 
are described next. 

3.1 Coding the Inputs: Fuzzification 

In coding the values from the sensors, one transforms the values of the 
sensor measurements in terms of the linguistic labels used in the precon- 
ditions of the rules. 

If the sensor reading has a crisp value, then the fuzzification stage 
requires matching the sensor measurement against the membership func- 
tion of the linguistic label as shown in Figure 4(a). If the sensor reading 
contains noise, it may be modeled by using a triangular membership 
function where the vertex of the triangle refers to the mean value of the 
data set of sensor measurements and the base refers to a function of the 
standard deviation (e.g., twice the standard deviation as used in [69]). 
Then in this case, fuzzification refers to finding out the intersection of 
the label’s membership function and the distribution for the sensed data 
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Figure 4: (a)- Matching a sensor reading xq with the membership func- 
tion to get fi(x o); (a)- crisp sensor reading (b)-fuzzy sensor reading. 


as shown in Figure 4(b). However, the most widely used fuzzification 
method is the former case when the sensor reading is crisp. 

3*2 Setting up the Control Knowledge Base 

There are two main tasks in designing the control knowledge base. First, 
a set of linguistic variables must be selected which describe the values of 
the main control parameters of the process. Both the main input param- 
eters and the main output parameters must be linguistically defined in 
this stage using proper term sets. The selection of the level of granularity 
of a term set for an input variable or an output variable plays an im- 
portant role in the smoothness of control. Secondly, a control knowledge 
base must be developed which uses the above linguistic description of 
the main parameters. Sugeno [49] has suggested four methods for doing 
this: 

1. Expert’s Experience and Knowledge 

2. Modelling the Operator’s Control Actions 

3. Modeling a process 

4. Self Organization 

Among the above methods, the first method is the most widely used 
[36]. In modeling the human expert operator’s knowledge, fuzzy control 
rules of the form: 

IF Error is small and Change-in-error is small then force is small 
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have been used in studies such as [51, 53]. This method is effective 
when expert human operators can express the heuristics or the knowl- 
edge that they use in controlling a process in terms of rules of the above 
form. Applications have been developed in process control (e.g., cement 
kiln operations [23]). Beside the ordinary fuzzy control rules which have 
been used by Mamdani and others, where the conclusion of a rule is an- 
other fuzzy variable, a rule can be developed whereby its conclusion is a 
function of the input parameters. For example, the following implication 
can be written: 

IF X is A x and Y is B x THEN Z =f 1 (X,Y) 

where the output Z is a function of the values that X and Y may take. 

The second method above, directly models the control actions of 
the operator. Instead of interviewing the operator, the types of control 
actions taken by the operators are modelled. Takagi and Sugeno [55] 
and Sugeno and Murakami [51] have used this method for modeling the 
control actions of a driver in parking a car. 

The third method deals with fuzzy modeling of a process where an 
approximate model of the plant is configured by using implications which 
describe the possible states of the system. In this method a model is de- 
veloped and a fuzzy controller is constructed to control the fuzzy model, 
making this approach similar to the traditional approach taken in con- 
trol theory. Hence, structure identification and parameter identification 
processes are needed. For example, a rule discussed by Sugeno [49] is of 
the form: 

If *i is A \ , x 2 is A£,..., then y = p j, +p[zi +p\x 2 + 

for i = l,....,n where n is the number of such implications and the 
consequence is a linear function of the m input variables. 

Finally, the fourth method refers to the research of Mamdani and his 
students in developing self-organizing controllers [44]. The main idea in 
this method is the development of rules which can be adjusted over time 
to improve the controllers’ performance. This method is very similar to 
recent work in the use of neural networks in designing the knowledge base 
of a fuzzy logic controller which will be discussed later in this chapter. 

3*3 Conflict Resolution and Decision Making 

As mentioned earlier, because of the partial matching attribute of fuzzy 
control rides and the fact that the preconditions of the rules do overlap, 
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membership functions: (a)- monotonic, (b)-triangular, (c)- trapezoidal, 
(d)- bell-shaped 

usually more than one fuzzy control rule can fire at one time. The 
methodology which is used in deciding what control action should be 
taken as the result of the firing of several rules can be referred to as the 
process of conflict resolution . The following example, using two rules, 
illustrates this process. Assume that we have the following: 

Rule 1: IF X is A x and Y is B 1 THEN Z is C x 
Rule 2: IF X is A 2 and Y is i ? 2 THEN Z is C 2 

Now, if we have xq and yo as the sensor readings for fuzzy variables X 
and Y, then their truth values are represented by /^(so) and ^Bi(yo) 
respectively for Rule 1, where pa x represents the membership function 
for A\. Similarly for Rule 2, we have Pa 2 ( x o) and /XB 2 (yo) as the truth 
values of the preconditions. The strength of Rule 1 can be calculated by: 

Ofi = /*A,(*o) a MB,(yo) 

where A refers to the conjunction operator which was defined earlier to 
be equal to Minimum operator . Similarly for Rule 2: 

£*2 = /M 2 (*o)A/ZB 2 (ifo). 



The control output of rule 1 is calculated by applying the matching 
strength of its preconditions on its conclusion: 


77 


/iq(a>) = a* A/ic,(w), 


and for Rule 2: 

/*C'M = 0(2 A fic 2 (v) 

where u> ranges over the values that the rule conclusions can take. This 
means that as a result of reading sensor values z 0 and y 0 , Rule 1 is rec- 
ommending a control action with n c ^ (w) as its membership function and 
Rule 2 is recommending a control action with Hc , 2 { w ) as its membership 
function. The conflict-resolution process then produces 

Hc(w) = Hc[ (w) V /i c -(o>) = [oti A fi Cl («)] V [ot 2 A (<*>)] 

where fic(v) is a point wise membership function for the combined con- 
clusion of Rule 1 and Rule 2. The A and V operators in above are defined 
to be the Min and Max functions respectively [36]. The result of this last 
operation (i.e., fic(v)) is a membership function and has to be translated 
(defuzzified) to a single value as discussed next. 

3.4 Decoding the Outputs: Defuzzification 

This necessary operation produces a nonfuzzy control action that best 
represents the membership function of an inferred fuzzy control action. 
Several defuzzification strategies have been suggested in literature. Among 
them, four methods which have been applied more often are described 
here. 

3.4.1 Tsukamoto’s defuzzification method 

As shown in Figure 6(a), if monotonic membership functions are used, 
then a crisp control action can be calculated by: 

7* _ sr=i 

Li=i v x 

where n is the number of rules with firing strength (u^) greater than 0 
and x x is the amount of control action recommended by rule i. 
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3.4.2 The Center Of Area (COA) method 

Assuming that a control action with a pointwise membership function 
fic has been produced, the Center of Area method calculates the center 
of gravity of the distribution for the control action. Assuming a discrete 
universe of discourse, we have 

Z* — 

~ E U^i) 

where q is the number of quantization levels of the output, zj is the 
amount of control output at the quantization level j and pc{ z j) repre- 
sents its membership value in C. 


3.4.3 The Mean of Maximum (MOM) Method 

The Mean of Maximum Method (MOM) generates a crisp control action 
by averaging the support values which their membership values reach 
the maximum. For a discrete universe, this is calculated by 




where l is the number of quantized z values which reach their m aximum 
memberships. 


3.4.4 Defuziflcation when the output of the rules are functions 
of their inputs 

As mentioned earlier, fuzzy control rules may be written as a function 
of their inputs. For example, 

Rule i: IF X is A { and Y is B { THEN Z is fi(X,Y) 

assuming that a* is the firing strengths of the rule i, then 

<7* _ E£=l Vi) 

Zj — 

Ej= 1 a i 

where n is the number of firing rules. 
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Figure 6: Defuzzification of the combined conclusion of rules described 
in the example. 


3.4.5 An example 

Assume that we have the following two rules: 

Rule 1: IF X is A x and Y is B x THEN Z is C x 
Rule 2: IF X is A 2 and Y is B 2 THEN Z is C 2 

Suppose zo and yo sure the sensor readings for fuzzy variables X and Y , 
and the following are membership functions: 

2 < x < 5 -l 3 < * < 6 

5 < a: < 8 ^ 1 2=2 6 < * < 9 

“ir 5 < y < 8 f M 4 < y < 7 

V 8<y<U MB2 1 7<y<10 

*=£■ l<z<4 _ / ^ 3 < z < 6 

^ 4 < z < 7 6 < z < 9 

Further assume that we are reading the sensor values xq = 4 and 
yo = 8. We illustrate how to calculate 

1. the membership function for the control action recommended by 
the combination of these two rides 
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2. the crisp value of the control action using the COA and MOM 
methods. 


First, the sensor readings «o and yo have to be matched against the 
preconditions Ai, B\ respectively. This will produce o) = 2/3 and 
UBiivo) = 1- Similarly, for rule 2, we have ha 2 (*o) = 1/3 and = 2/3. 

The strength of rule 1 is calculated by: 

oti = MininA^zo^^Biiyo)) = Aftn(2/3,1) = 2/3. 

and similarly for rule 2: 


a 2 = Min(n Al (x 0 ),viB,(yo)) = Min( 1/3, 2/3) = 1/3. 


Applying ai to the conclusion of rule 1 results in the shaded trapezoid 
figure shown in Figure 6 for C\. Similarly, applying c *2 to the conclusion 
of rule 2 results in the dashed trapezoid shown in Figure 6 for C^. By 
superimposing the resulted memberships over each other and using the 
Max operator, the membership function for the combined conclusion of 
these rules is found (as shown the right hand side of the Figure 6). 
Furthermore, using the COA method (explained earlier), the defuzzified 
value for the conclusion is found: 


j coa - 


= 2 -| + 3-i+4-f + 5 - j + 6 -| + 7-| + 8 -| = ^ 


442,2,1,1,1 

3 ' 3 ' 3 ' 3 1 3 • 3 ' 3 


Using the MOM defuzzification strategy, three quantized values reach 
their maximum memberships in the combined membership function (i.e., 
3, 4, and 5 with membership values of 2/3). Therefore, 


Z MOM - 


3 


4 A Hierarchical Approach in Design of Fuzzy 
Controllers 


Berenji et. al. [8] have proposed the following algorithm in design of 
fuzzy controllers with multiple goals. The algorithm has been applied in 
control of a cart pole balancing system. 

1. Let G = {yi, 02 > ... g n } be the set of goals that system should achieve 
and maintain. Notice that for n = 1 (i.e., no interacting goals), 
the problem becomes simpler and may be handled using the earlier 
methods in fuzzy control (e.g., see [36]). 
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2. Let G = p(G) where p is a function which assigns priorities among 
the goals. We assume that such a function can be obtained in 
a particular domain. In many control problems, it is possible to 
specifically assign priorities to the goals. For example, in the simple 
problem of balancing a pole on the palm of a hand and also moving 
the pole to a pre- determined location, it is possible to do this by 
first keeping the pole as vertical as possible and then gradually 
moving to the desired location. Although these goals are highly 
interactive (i.e., as soon as we notice that the pole is falling, we 
may temporarily set aside the other goal of moving to the desired 
location), we still can assign priorities fairly well. 

3. Let U = {ui, U 2 , ..., u n } where U{ is the set of input control param- 
eters related to achieving 

4. Let A = {ai, a 2 , ..., a n } where a* is the set of linguistic values used 
to describe the values of the input control parameters in U{. 

5. Let C = {ci,C 2 ,...,Cn} where c* is the set of linguistic values used 
to describe the values of the output Z . 

6. Acquire the rule set iZi of approximate control rules directly related 
to the highest priority goal. These rules are in the general form of 

IF u\ is ai THEN Z is c\. 

7. For i = 2 to n, subsequently form the rule sets R{. The format of 
the rtiles in these rule sets is similar to the ones in the previous 
step except that they include aspects of approximately achieving 
the previous goal: 

IF <fr_ i is approximately achieved and Ui is a* THEN Z is c*. 

The approximate achievement of a goal in step 7 of the above algo- 
rithm refers to holding the goal parameters within smaller boundaries. 
The interactions among the goal gi and goal i sire handled by form- 
ing rules which include more preconditions in the left hand side. For 
example, let us assume that we have acquired a set of rules Ri for keep- 
ing a pole vertical. In writing the second rule set i ?2 for moving to a 
pre-specified location, aspects of approximately achieving g\ should be 
combined with control parameters for achieving < 72 - For example, a pre- 
condition such as the pole is almost balanced can be added while writing 
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the rules for moving to a specific location. A fuzzy set operation known 
as concentration [78] as described earlier can be used here to systemat- 
ically obtain a more focused membership functions for the parameters 
which represent the achievement of previous goals. The above algorithm 
has been applied in cart-pole balancing and more details can be found 
in [8]. 

5 Applications of Fuzzy Logic Controllers 

In recent years, there has been a very significant increase in the number 
of applications of fuzzy logic control. Although we will not provide a 
complete list of the applications here, a selective number of both the 
laboratory prototype systems and real commercial applications will be 
discussed. 

As mentioned earlier, Mamdani and Assilian [36] were the first to 
apply the fuzzy set theory to control problems (e.g., the control of a 
laboratory steam engine). This experiment triggered a number of other 
applications such as the warm water process control [22], activated sludge 
wastewater treatment [63], and traffic junction control [42], Fuzzy logic 
control has also been applied in a diverse set of domains such as arc weld- 
ing [38], refuse incineration [40], automobile speed control [37], model 
cars [55, 53, 51, 52], cement kiln control [65], aircraft flight control [30], 
robot control [34, 59, 19, 66, 41], water purification process control [70], 
nuclear reactor control [9], elevator control [33], process control [13, 44], 
adaptive control [15], automatic tuning [39], control of a liquid level rig 
[14], automobile transmission control [20], gasoline refinery catalytic re- 
former control [3], two-dimensional ping-pong game playing [18], control 
of biological processes [12], activated sludge plant[76], knowledge struc- 
ture [46], and comparison with classical control theory [4, 58]. Among 
these, the cement kiln controller [65] was the first industrial application. 
The celebrated Hitachi’s automatic train controller is among the more 
recent fielded applications of fuzzy logic control. In the following, we 
discuss a few of these systems in more details. 

5.1 Automatic train control 

Yasunobu and Miyamoto at Hitachi, Ltd. [73] have designed a fuzzy 
controller for the Automatic Train Operation (ATO) systems. This sys- 
tem has been in use in the city of Sendai, Japan since July 1987. The 
two main operations of the system are Constant Speed Control (CSC) 
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and Train Automatic Stop Control (TASC). The CSC operation results 
in maintaining a constant target speed (specified by the operator at the 
start of the train operation) during the train travel. The TASC oper- 
ation controls the speed of the train in order to stop the train at the 
prespecified location. The system uses only a few rules (i.e., 12 rules for 
each of the CSC and the TASC operations) and the control is evaluated 
every 100 miliseconds. These operations use the evaluation of safety, rid- 
ing comfort, traceability of target velocity, accuracy of stop gap, running 
time and energy consumption criteria in deciding a control strategy. The 
control rides are of the predictive fuzzy control rule types of the form: 

IF( u is Ci — > x is Ai and y is B{) then u is C*. 

For example, when the train is in the TASC zone, the following rule 
is used: 

If the control notch is not changed and 

the train will stop at the predetermined location, then 

the control notch is not changed. 

The system performs as skillfully as human experts do and superior 
to an ordinary PID 3 automatic train operation controller in terms of 
stopping precision, energy consumption, riding comfort, and running 
time. 

5.2 Fuzzy Logic Hardwares and Fuzzy Logic Computer 
Chips 

Yamakawa [71, 72] has pioneered using fuzzy logic at the hardware level 
by developing systems which achieve information processing leading to- 
ward what is referred to as the fuzzy computer . The systems developed 
by Yamakawa can accept linguistic information and perform approxi- 
mate reasoning-based inference at very high speeds (e.g., more than 10 
million Fuzzy Logical Inferences Per Second or FLIPS). Among the many 
applications of Yamakawa’s fuzzy electronic circuits is the control of an 
inverted pendulum of a short length (e.g., 15 cm, weighting 3.5 grams). 
Also, Yamakawa’s computer chips have been used in biomedical experi- 
ments [57] and orthodentic results evaluation [75]. 

Fuzzy Logic Chips were first developed by Togai and Watanabe at 
AT& T Bell Lab [61]. Since the original design, several extensions have 
been provided in [60] and [67]. 

3 Proportional, Integral, and Derivative. 
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5.3 Sugeno’s model car 

Sugeno has designed a model car which can automatically park itself in a 
garage. The fuzzy control rules are derived by modeling a human’s park- 
ing actions and developed based on the third method described earlier 
in 3.2. The car uses the front wall distance («), side wall distance (y), 
and the heading angle of the car ( 0 ) as its input variables. Three output 
control variables are used: the angle of the front wheels in moving for- 
ward (backward) and a control variable for speed control. For example, 
a control rule for steering control in moving forward is: 

If x is A, y is 5, 0 is C then / = po + Pix + P 2 y + p$0. 

Eighteen control rules are used for the steering control in moving forward 
and sixteen control rules are used for the steering control in moving 
backward. The input-output data is collected while a human parks the 
car and is used in parameter identification (e.g., the identification of 
the coefficients po> P 2 > and p$) of the above rules. Many successful 
experiments have been done using the model car which is equipped with 
a microprocessor and sensing devices. 

5.4 Sugeno’s model helicopter 

Sugeno has initiated several projects on applying fuzzy logic control to 
the control of a model helicopter. Among these are radio control by 
oral instructions, automatic autorotation entry in engine failure cases, 
and unmanned helicopter control for sea rescue [48]. Although these 
projects have just started, several interesting results have already been 
achieved. The input variables from the helicopter include pitch, roll, and 
yaw, and their first and second derivatives. The control rules written for 
the helicopter regulate the up /down, forward/backward, left /right, and 
nose direction. For example, the longitudinal stick controls pitch, and 
therefore forward/backward movement of the rotorcraft. 

An example of a fuzzy control rule for hovering is as the following: 

If the body rolls, then 
control the lateral in reverse. 

Or as another example for hovering control: 

If the body pitches, then 
control the longitude in reverse. 
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The helicopter control problems under study in these projects are 
challenging control problems and are already producing results which 
illustrate the strength of the fuzzy logic control technology. 

5.5 Other recent applications 

Among the other applications, fuzzy logic control has found applications 
in the household appliances. Examples of these which will not further be 
discussed here are: the air conditioner by Mitsubishi; washing machine 
by Matsushita and Hitachi; VCR by Sanyo and Matsushita; vacuum 
cleaner by Matsushita; palmtop computer by Sony; Microwave oven by 
Toshiba, Sharp, Sanyo, and Hitachi; photography camera by Canon and 
many others. 

6 Learning Fuzzy Logic Controllers 

In many controller design tasks, it is important to develop a controller 
which can learn from experience to improve its performance. Here we 
discuss the research on developing Self Organizing Controllers (SOCs) 
and the recent work in using neural networks to provide a learning at- 
tribute for the fuzzy logic controllers. 

6.1 Self Organizing Controllers 

The Self- Organizing Controllers (SOCs) are among the earliest fuzzy 
controllers which provide an ability to change control policies with re- 
spect to the process and the operating environment. The function of the 
SOC is a combined system identification and control task. It infers from 
the error and change in error, the change in control action to apply. In 
addition to the error and change of error scales supplied by a control 
designer, SOC uses a third scale to calculate the actual change in the 
process input. In this sense, SOC is similar to the conventional PID 
controllers’ three gain parameters 4 . 

SOCs evaluate their performance using a local measure which assesses 
the performance over a small set of plant states and a global criterion 
which measures the overall performance. The performance measure re- 
sembles a human decision maker who determines the output correction 
required from a knowledge of the error and the change of error. How- 
ever, the decision tables required by the SOCs have to be generated and 

4 Proportional, Integral, and Derivative. 
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stored before hand. For more complex processes than the ones discussed 
by Procyk and Mamdani [44](i.e., other than single-input single-output 
processes), the generation of these decision tables may be difficult. 

6.2 Neural Networks in Fuzzy Logic Control 

Similarities exist between the neural networks and the fuzzy logic con- 
trollers. They both can handle extreme nonlinearities in the system. 
Both techniques allow interpolative reasoning which frees us from the 
true/false restriction of logical systems such as the ones used in sym- 
bolic AI. For example, once a neural network has been trained for a set 
of data, it can interpolate and produce answers for the cases not present 
in the training data set. Similar properties hold for a fuzzy logic con- 
troller. The weighted average scheme of fuzzy control and the sum of 
the products of the neural nets are similar in principle. 

The main idea in integrating fuzzy logic control with neural networks 
is to use the strength of each one collectively in the resulting neuro- fuzzy 
control system. This fusion allows: 

1. A human understandable expression of the knowledge used in con- 
trol in terms of the fuzzy control rules. This reduces the difficulties 
in describing the trained neural network which is usually treated 
as a black box. 

2. The fuzzy controller learns to adjust its performance automatically 
using a neural network structure and hence learns by accumulating 
experience. 

The main emphasis in the research so far has been on automatic de- 
sign and fine-tuning of the membership functions used in fuzzy control 
through learning by neural networks. Here, we focus on only three hybrid 
models but many more references are available (such as the proceedings 
of Iizuka-88 and Iizuka-90 conferences [1, 2]). 

6.3 Fuzzy Control and AHC 

Lee and Berenji [32] and Lee[31] have combined the Adaptive Heuristic 
Critic (AHC) model of Barto, Sutton, and Anderson with fuzzy control 
to learn the membership functions of the conclusions of the control rules. 
This work builds on Barto et. aV s pioneering work on applying neural 
networks in control. Two neural-like elements are used in this model. 
The Adaptive Heuristic Critic learns by updating the predictions of the 
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Figure 7: The AEIC Architecture 


system’s failure over time. The integrated fuzzy- AHC model has been 
tested in the domain of cart-pole balancing and the results have been 
consistently better when compared with the performance of the AHC 
model alone (e.g., in terms of speed of learning and smoothness of con- 
trol). However, this model is difficult to apply for other control systems 
due mainly to the fact that developing the mathematical functions for 
the trace function and credit assignment are not trivial. The structure 
proposed here suffers from the lack of generality and may be difficult to 
apply to larger scale systems. 

6.4 Fuzzy Control and two layer networks 

Berenji [7, 6] has proposed a Neuro-Fuzzy Controller (ARIC) architec- 
ture which extends the Fuzzy- AHC model mentioned earlier. Figure 7 
presents the architecture of the proposed AEIC model. In AEIC, two 
networks replace the two neural-like elements. These networks are re- 
ferred to as the Action-state Evaluation Network (AEN) and Action Se- 
lection Network (ASN). The AEN and ASN networks are multi-layered 
neural networks which are based on the back propagation algorithm and 
reinforcement learning. 
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6.5 Use of Clustering 

Takagi and Hayashi [54] have presented an algorithm for combining neu- 
ral networks with fuzzy logic which consists of three parts. The first 
part finds a suitable partition of the training data by using a clustering 
algorithm. Once the best partition of the data is found, then the sec- 
ond part of the algorithm identifies the membership functions of the IF 
parts of the rules. The last step of the algorithm is to determine the 
amount of control output for each rule. A neural network is used for 
the identification of the membership functions in the second step above. 
After this network is trained, it assigns the correct rule number to the 
combination of different sensor readings. To identify the THEN part of 
a rule, a backward elimination method is used. This method arbitrarily 
eliminates an input variable and the neural network of each THEN part 
is retrained. This process determines the input variable with minimal 
effects. 

Takagi and Hayashi ’s model is very similar to Berenji’s work in induc- 
tive learning and fuzzy control [5] where an AI clustering method (C4, 
a descendent of Quinlan’s ID3 algorithm) is used. In Berenji’s model, 
a decision theoretic measure is used to decide which input variable the 
decision tree should use to branch on next. The number of leaves of the 
resulting decision tree will indicate the number of control rides needed. 

6.6 Other Research 

Among other work in this area is Kosko’s work on Fuzzy Cognitive Maps 
(FCM) [27]. The FCMs are graphical representations of the causal rela- 
tionships between different factors, where the weights on the links rep- 
resent the positive or negative causal relationships. Also, Kosko’s Fuzzy 
Associative Memories (FAM) [28] can map fuzzy input patterns into 
stored fuzzy output patterns and hence are useful tools in representing 
the knowledge base of a fuzzy controller. 

Fuzzy logic controllers and neural network controllers are comple- 
mentary, and it is expected that the amount of research into hybrid ap- 
proaches will grow significantly in the next few years, especially in Japan, 
where many applications have already been reported using a combina- 
tion of these methods. In the U.S., NASA has taken an active role in 
integrating these two powerful techniques (e.g., sponsorship of the First 
and Second International Conferences on Neural Networks and Fuzzy 
Logic at Johnson Space Center in 1988 and 1990). 



89 


7 Discussion 

Among the problems which still deserve serious attention is the problem 
of providing proof of stability for FLCs. In contrast to the analytical 
control theory, FLC lacks this necessary attribute although some theo- 
retical work has begun producing interesting results (e.g., [29, 26, 10, 11, 
45, 17, 16, 68, 24, 62, 21, 43, 74, 47, 25]). 

Another area that requires attention is in what we refered to as fuzzy 
modeling of systems earlier in this paper. Here the attention should be 
focused on structure identification and parameter identification of the 
dynamics of a system in order to develop a model which could later be 
used to develop the fuzzy logic controller [56, 50]. 

Finally, as we briefly discussed in the previous section, artificial neu- 
ral networks and fusion techniques are being developed in order to de- 
velop fuzzy logic controllers which can learn from experience. Despite 
these open issues, fuzzy logic control has achieved a huge commercial 
success in recent years. Because these controllers are easy to manufac- 
ture and greatly resemble human reasoning, it is expected that there will 
be many more applications in the near future. 
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METHODS AND APPLICATIONS 

OF 

FUZZY MATHEMATICAL 
PROGRAMMING 

H.-J. Zimmermann 
RWTH Aachen 
Templergraben 55 
W-5100 Aachen (Germany) 

1. INTRODUCTION 

In spite of other uses of the term "mathematical programming" it shall 
be interpreted here as it is normally done in Operations Research, i.e. an 
algorithmic approach to solving models of the type 

maximize f(x) 

such that gj(x) = 0, i = l,...,m (1) 

Depending on the mathematical character of the objective function, 
f(x), and the constraints, gj(x), many types of mathematical programming 
algorithms exist, such as, linear programming, quadratic programming, 
fractional programming, convex programming etc. Exemplarily, we shall use 
the simplest and most commonly used type, i.e. linear programming, which 
focusses on the model 

maximize f(x) = z = c T x 

such that Ax < b 

x>0 (2) 

withc,x e IR n ,b € lR m ,A e IR 11 ™ 1 . 


In this model it is normally assumed that all coefficients of A, b, and c 
are real (crisp) numbers; that "<" is meant in a crisp sense, and that "maximize" 
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is a strict imperative. This also implies that the violation of any single 
constraint renders the solution infeasable and that all constraints are of equal 
importance (weight). Strictly speaking, these are rather unrealistic 
assumptions, which are partly relaxed in "fuzzy linear programming". 

If we assume that the LP-decision has to be made in fuzzy 
environments, quite a number of possible modifications of (2) exist. First of all, 
the decision maker might really not want to actually maximize or minimize the 
objective function. Rather he might want to reach some aspiration levels which 
might not even be definable crisply. Thus he might want to "improve the 
present cost situation considerably," and so on. 

Secondly, the constraints might be vague in one of the following ways: 
The < sign might not be meant in the strictly mathematical sense but smaller 
violations might well be acceptable. This can happen if the constraints 
represent aspiration levels as mentioned above or if, for instance, the 
constraints represent sensory requirements (taste, color, smell, etc.) which 
cannot adequately be approximated by a crisp constraint. Of course, the 
coefficients of the vectors b or c or of the matrix A itself can have a fuzzy 
character either because they are fuzzy in nature or because perception of them 
is fuzzy. 

Finally the role of the constraints can be different from that in classical 
linear programming where the violation of any single constraint by any amount 
renders the solution infeasable. The decision maker might accept small 
violations of different constraints. Fuzzy linear programming offers a number 
of ways to allow for all those types of vagueness and we shall discuss some of 
them below. 

Before we develop a specific model of linear programming in a fuzzy 
environment it should have become clear, that by contrast to classical linear 
programming "fuzzy linear programming" is not a uniquely defined type of 
model but that many variations are possible, depending on the assumptions or 
features of the real situation to be modelled. 

Essentially two "families" of models can be distinguished: One 
interprets "fuzzy mathematical programming" as a specific decision making 
environment to which Bellman and Zadeh’s definition of a "decision in fuzzy 
environments" [1970] can be applied. The other considers components of 
model (2) as fuzzy, makes certain assumptions, for instance, about the type of 
fuzzy sets which as fuzzy numbers replace the crisp coefficients in A, b, or c, and 
then solve the resulting mathematical problem. The former approach seems to 
us the more application oriented one. From experience in applications a 
decision maker seems to find it much easier to describe fuzzy constraints or to 
establish aspiration levels for the objective(s) than to specify a large number of 
fuzzy numbers for A, b, or c. We shall, therefore, first describe the first 
approach and then elaborate on the other approaches. 
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2. Fuzzy Mathematical Programming 

2a Symmetric Fuzzy Linear Programming 

As mentioned above Fuzzy LP is considered as a special case of a 
decision in a fuzzy environment. The basis in this case is the definition 
suggested by Bellman and Zadeh [1970]: 

Definition 1: 


Assume that we are given a fuzzy goal G and a fuzzy constraint C in a 
space of alternatives X. Then G and C combine to form a decision, D, which is 
a fuzzy set resulting from intersection of G and G. In symbols, D = GnC and 
correspondingly 


Md = min (mg, Mc>- 

More generally, suppose that we have n goals G 1( ...,G n and m 
constraints Then, the resultant derision js the intersection of the 

given goals Gj,...,G n and the given constraints Cj,...,C m . That is, 

D = Gj n G 2 n • • • • n G n n Cj n C 2 n n C m 

and correspondingly 

Md = min '{MGi > Mg 2 ' * • • > MG n > MCi / Mc 2 » • • • / Mc m } 

= min {pg. , /igj > = min {/*j}. 

This definition implies: 

1. The "and" connecting goals and constraints in the model corresponds to 
the "logical and". 

2. The logical "and" corresponds to the set theoretic intersection. 

3. The intersection of fuzzy sets is defined in the possiblistic sense by the 
min-operator. 

For the time being we shall accept these assumptions. An important 
feature of this model is also its symmetry, i.e. the fact that, eventually, it does 
not distinguish between constraints and objectives. This feature is not 
considered adequate by all authors (see, for instance, Asai et. al. [1975]). We 
feel, however, that this models quite well real behaviour of decision makers. 
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If we assume that the decision maker can establish in model (2) an 
aspiration level, z, of the objective function, which he wants to achieve as far as 
possible and if the constraints of this model can be slightly violated - without 
causing infeasibility of the solution - then model (2) can be written as 

Findx 

such that c^x >z 
Ax <b 

x >o (3) 

Here < denotes the fuzzified version of < and has the linguistic 
interpretation "essentially smaller than or equal." > denotes the fuzzified 
version of > and has the linguistic interpretation "essentially greater than or 
equal." The objective function in (2) might have to be written as a minimizing 
goal in order to consider z as an upper bound. 

We see that (3) is fully symmetric with respect to objective function 
and constraints and we want to make that even more obvious by substituting 
= B and (' z b ) = d. Then (3) becomes: 

Findx 

such that Bx^d 

x> 0 (4) 

Each of the (m + 1) rows of (4) shall now be represented by a fuzzy set, 
the membership functions of which are /q ( x ) . The membership function of 
the fuzzy set "decision" of model (4) is 

M5(x) = min {mj(x)} 

1 ( 5 ) 

/ij(x) can be interpreted as the degree to which x fulfills (satisfies) the fuzzy 
inequality BjX^dj (where B ; denotes the ith row of B). 

Assuming that the decision maker is interested not in a fuzzy set but in 
a crisp "optimal" solution we could suggest the "m ax im izin g solution" to (5), 
which is the solution to the possibly nonlinear programming problem 

max min {uj(x)} = max uf>(x) (6) 

XisO j ,v ’ x* o u v ’ 

Now we have to specify the membership functions /^(x). /^(x) should 
be 0 if the constraints (including objective function) are strongly violated, and 1 
if they are very well satisfied (i.e., satisfied in the crisp sense); and jq(x) should 
increase monotonously from 0 to 1, that is: 
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1 ifBjX<dj 

A»i( x ) = e [°» !] ifd i < B iX ^ d i + Pi 

0 ifB i x>d i + p i 


i=l,...m+l 


( 7 ) 


Using the simplest type of membership function we assume them to be linearly 
increasing over the "tolerance interval" p ; : 


Mi(x) = 


1 o H if B i* ~ d i 
BjX-d: 

1 ifdj <BjX ^dj+Pj i=l,..,m+l 

0 Pi if BjX > dj+Pi (8) 


The Pj are subjectively chosen constants of admissable violations of the 
constraints and the objective function. Substituting (8) into (6) yields, after 
some rearrangements [Zimmermann 1976] and with some additional 
assumptions, 

max min 

Introducing one new variable, A, which corresponds essentially to (5), 
we arrive at 



maximize A 

such that Apj+BjX <dj+pj i=l,...,m+l 

x >0 (10) 

If the optimal solution to (10) is the vector (A, Xq), the Xq is the 
maximizing solution (6) of model (2) assuming membership functions as 
specified in (8). 

The reader should realize that this maximizing solution can be found 
by solving one standard (crisp) LP with only one more variable and one more 
constraint than model (4). This makes this approach computationally very 
efficient. 

A slightly modified version of models (9) and (10), respectively, results 
if the membership functions are defined as follows: A variable tj, i=l,...,m+l, 
0<ti<pj, is defined which measures the degree of violation of the ith 
constraint: The membership function of the ith row is then 


A*i(x) = 1- 


Pi 


( 11 ) 
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The crisp equivalent model is then 

maximize A 

such that Apj + tj <p ; i=l,....,m+l 

BjX-tj <dj 

ti ^Pi 

x,t >0 (12) 

This model is larger than model (10) even though the set of constraints 
tj<pj is actually redundant. Model (12) has some advantages, however, in 
particular when performing sensitivity analysis. 

The main advantage, compared to the unfuzzy problem formulation, is 
the fact that the decision maker is not forced into a precise formulation 
because of mathematical reasons, even though he might only be able or willing 
to describe his problem in fuzzy terms. Linear membership functions are 
obviously only a very rough approximation. Membership functions which 
monotonically increase or decrease, respectively, in the interval of [d^dj+pJ 
can also be handled quite easily, as will be shown later. 

It should also be observed that the classical assumption of equal 
importance of constraints has been relaxed: the slope of the membership 
functions determines the "weight" or importance of the constraint. The slopes, 
however, are determined by the pj’s: The smaller the Pj the higher the 
importance of the constraint. For pj=0 the constraint becomes crisp, i.e. no 
violation is allowed. 

So far, the objective function as well as all constraints were considered 
fuzzy. If some of the constraints are crisp, Dx<b, then these constraints can 
easily be added to formulations (11) or (12), respectively. Thus (11) would, for 
instance, become: 

maximize X 

such that Xpj+BjX < dj+pj 

Dx <b 

x,A > 0 (13) 

2B LINEAR PROGRAMS WITH FUZZY CONSTRAINTS AND 
CRISP OBJECTIVE FUNCTIONS 

So far, it has been assumed that the objective function could be 
calibrated by a given z and then formulated as a fuzzy set, resulting in the 
symmetrical model formulation. It might, however, not be possible to find in a 
natural way the required z. In this case the symmetry of the model can be 
gained by applying a specialisation of Zadeh’s "maximizing set" to the objective 
function: 
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Definition 2: [Werners 1984] 


Let f:X -*• iRl be the objective function, R = fuzzy feasible region, S(R) = 
support of R, and R. = a-level cut of R for a = 1. The membership function of 
the goal (objective function) given solution space R is then defined as 


Mg(x)= 


0 

f(x)-sup f 

R, 


sup f-sup f 
S(8) R, 

1 


if f(x) < sup f 

R i 


ifsupf<f(x) < sup f 

R, S(R) 

if sup f < f(x) 

S(R) 


The corresponding membership function in functional space is then 


Mo( r ):= 


sup /ig(x) if r e R,f _1 (r) ^ 0 

xefV) 

0 else (13A) 


Adding this fuzzy set to the fuzzy sets defining the solution space gives 
again a symmetrical model to which (10) or (12) can be applied. Definition 2 
becomes easier to understand if we apply it to a specific given LP-structure: 


Let us modify (3) by adding a set of crisp constraints, Dx<b, and 
changing the objective function to maximize f(x). This yields model 

maximize f(x) = c T x 

such that Ax < b ) 

'V I -V 

Dx < b’ > R 

x < 0 ) (14) 


Let the membership functions of the fuzzy sets representing the fuzzy 
constraints be defined in analogy to (8) as 

( 1 if AjX < bj 


Mi( x )= 


bj+Pi-AjX 

Pi 


if bj < AjX<bj+pj 


0 ifAjX>bj+Pi 


(15) 
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On the basis of the two LP’s following, the membership function of the 
fuzzy set defined in definition 2 can then easily be defined: 


maximize f(x) = c T x 

such that Ax <b 

Dx < b’ 
x >0 

The optimal solution of this model is f t = sup f (c T x) 

R 1 


opr 


Maximize f(x) = c T x 

such that Ax < b + p 

Dx < b’ 
x^O 

The optimal solution of the model model is f G = sup f = (c T x) opt . 

S(R) 


(16) 


(17) 


The membership function is therefore 

1 , if ^ < c T x 

pl Y 


Mo( x ) = 1 


C : X- f: 


fo-h 

0 


if fj < Cpx < 

if c T x < fj 


(18) 


The "equivalent" model to (14) is therefore: 
maximize A 

such that A^-fj) - c T x < -fj 
A p + Ax < b + p 

Dx < b’ 

A <1 

A,x>0 


(19) 



Example: 

Consider the LP-Model 
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maximize z = 2x t + x 2 

such that Xj < 3 

+ x 2 < 4 
5xj + x 2 < 3 

X^ ^ 0 


The "tolerance intervals" of the constraints are pj = 6,p 2 = 4, p 3 = 2. 
f D and fj can be determined to be 7 and 16, respectively. Hence, model (19) is 

maximize A 

such that 9A - 2xj - x 2 < - 7 
6A + Xj <9 
4A + Xj + x 2 < 8 
2A + 5xj + x 2 < 5 
A <1 

A^p^ 2:0 

The solution to this problem is x° = 5.84, x| = 0, A 0 = .52. 

Some authors suggest not to use a "symmetrical" approach but rather 
to compute a fuzzy set "decision". They compute the optimal values of the 
objective function for all a -level-sets of the solution space. The membership 
function of the "decision" is then defined to be the a’s corresponding to the 
respective optimal values of the objective function. [Qrlovski 1977] 

In a certain sense this philosophy is similar to that of those authors 
who suggest to determine to model (4) not a crisp solution (6) but the fuzzy set 
decision. To do that a parametric linear programming problem has to be solved 
[Chanas 1983]. Even though this approach leads to quite impressive results in 
the 2-dimensional case, it is rather questionable whether the decision maker 
can make use of it in a realistically sized problem. 


2C EXTENSIONS 

So far, two major assumptions have been made in order to arrive at 
"equivalent models" which can be solved efficiently by standard LP-methods: 
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1. Linear membership functions were assumed for all fuzzy sets involved. 

2. The use of the minimum-operator for the aggregation of fuzzy sets was 
considered to be adequate. 

The relaxation of these two assumptions leads to complication which are 
differently severe depending on the type of relaxation: 

1. Nonlinear membership functions 


The linear membership functions used so far could all be defined by 
fixing two points, the upper and lower aspiration levels or the two bounds of 
the tolerance interval. The most obvious way to handle nonlinear membership 
functions is probably to approximate them piece-wise by linear functions. Some 
authors [Hannan 1981; Nakamura 1985] have used this approach and shown 
that the resulting equivalent crisp problem is still a standard linear 
programming problem. 

This problem, however, can be considerably larger than model (10) 
because in general one constraint will have to be added for each "linear piece" 
of the approximation. Quite often S-shaped membership functions have been 
suggested, particularly if the membership function is interpreted as a kind of 
utility function (representing the degree of satisfaction, acceptance etc.). 
Leberling [1981], for instance, suggests such a function which is also uniquely 
determined by two parameters. He suggests 




with a, b, S > 0. This hyperbolic function has the following formal 
properties: 


/i H (x) is strictly monotonously increasing. 

1 a + b 

iiir(x) = — where x = 

2 2 


/x H (x) is strictly convex on [- «, (a + b)/2] and strictly concave on [(a + b)/2, + 

°°j. 


For all x e iR: = 0 < ju H (x) < 1 and /* H (x) approaches asymptotically f(x) = 0 
and f(x) = 1, respectively. 
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Leberling shows that choosing as lower and upper aspiration levels for 
the fuzzy objective function z = cx of an LP a = c (lower bound of z) and b = c 
(upper limit of the objective function), and representing this (fuzzy) goal by a 
hyperbolic function one arrives at the following crisp equivalent problem for 
one fuzzy goal and all crisp constraints: 

minimize A 

! eZ’(x)-e-Z’(x) j 

such that A- < — 

2 e Z’(x) + e -Z’(x) 2 

Dx < b’ 

x, A > O (20) 

with Z’(x) = (EjCjXj - l(c + c)6. For each additional fuzzy goal or 
constraint one of these exponential rows has, of course, to be added to (20). 

For x„ +1 = tanh _1 (2A - 1), model (20) is equivalent to the following 
linear model: 

maximize x n+1 

such that 5 £c:Xj -x„ +1 > ^ 5(c + c) 

i 1 1 2 

Dx < b’ 

Xn + i,X>0 

( 21 ) 

This is again a standard linear programming model which can be solved, for 
instance, by any available simplex code. 

The above equivalence between models with nonlinear membership 
functions is not accidental. It has been proven that the following relationship 
holds [Werners 1984, p. 143]. 

Theorem 1 


Let {f k }, k = 1,...,K be a finite family of functions f k : iR n -» IR 1 , x° e X c IR n . 
g: IR 1 -*• IR 1 strictly monotonously increasing and A, A’e IR. Consider the two 
mathematical programming problems 

maximize A 

such that A<f k (x) k=l,...,K 
xeX 
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(22) 


maximize A’ 

such that A’<g(f k (x)) k=l,...K. 

xeX 


(23) 


If there exists a A 0 e R’ such that (A 0 , x°) is the optimal solution of (22) then 
there exists a A 0 € R’ such that (A 0 , x°) is the optimal solution of (23). 

Theorem 1 suggests that quite a number of nonlinear membership 
functions can be accommodated easily. Unluckily, the same optimism is not 
justified concerning other aggregation operators. 

The computational efficiency of the approach mentioned so far has 
rested to a large extent on the use of the min-operator as a model for the 
logical "and" or the intersection of fuzzy sets, respectively. Axiomatic 
[Hamacher 1978] as well as empirical [Thole, Zimmermann, Zysno 1979, 
Zimmermann, Zysno 1980, 1983] investigations have shead some doubt on the 
general use of the min-operator in decision models. Quite a number of context 
free or context dependent operators have been suggested in the meantime [see, 
e.g., Zimmermann 1990b, ch. 3], The disadvantage of these operators is, 
however, that the resulting crisp equivalent models are no longer linear [see, 
e.g., Zimmermann 1978, p.45], which reduces the computational efficiency of 
these approaches considerably or even renders the equivalent models 
unsolvable within acceptable time limits. There are, however, some exceptions 
to this rule, and we will present two of them in more detail. 

One of the objections against the min-operator (see, for instance, 
Zimmermann and Zysno [1980]) is the fact that neither the logical "and" nor 
the min-operator is compensatory in the sense that increases in the degree of 
membership in the fuzzy sets "intersected" might not influence at all 
membership in the resulting fuzzy set (aggregated fuzzy set or intersection). 
There are two quite natural ways to cure this weakness: 


1. Combine the (limitational) min-operator as model for the logical "and" 
with the fully compensatory max-operator as a model for the inclusive "or". Por 
the former, the product operator might be used alternatively and for the latter 
the algebraic sum might be used. This approach departs from distinguishing 
between "and" and "or" aggregation as being somewhere between the "and" and 
the "or". (Therefore it is often called compensatory and.) 

2. Stick with the distinction between "and" and "or" aggregators and 
introduce a certain degree of compensation into these connectives. 


Compensatory "and". For some applications it seems to be important that the 
aggregator used maps above the max-operator and below the min-operator. 
The A -operator [Zimmermann, Zysno 1980] would be such a connective. For 
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purposes of mathematical programming it has, however, the above-mentioned 
disadvantage of low computational efficiency. An acceptable compromise 
between empirical fit and computational efficiency seems to be the convex 
combination of the min-operator and the max-operator: 

"> m 

/x c (x) = 7 min /jj(x) + (1 - 7) max n { (x) 7 e [0, 1] 

i-i -1 ( 24 ) 

For determining the maximizing decision the following problem has to be 
solved. 


max (7 min {^(x)} + (1 - 7)max /irf/irfx)}) 

x cX > j-1 i-1 


or 


maximize 7-A 1 + (l-7)A 2 

such that Xi < ^(x) i=l,..., m 

A 2 < pj(x) for at least one i e { l,...,m} 
xeX 

or 

maximize 7A 1 + (1-7)A 2 

such that Aj < /ij(x) i=l,..., m 

m A 2 < /*i(x) + M 7i i=l,...,m 

X7i < m - 1 
i-*l 

7 e { 0 , 1 }, M is a very large real number 
xeX 

( 25 ) 

For linear membership functions of the goals and the constraints ( 25 ) is a 
mixed integer linear program that can be solved by the appropriate available 
codes. 

If one wants to distinguish between an "and"-aggregation and an "or"- 
aggregation (for instance, for the sake of easier modelling) one may want to use 
the following operators: 
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Definition 2 [Werners 1984] 

Let fij(x) be the membership functions of fuzzy sets which are to be aggregated 
in the sense of a fuzzy and (and). The membership function of the resulting 
fuzzy set is defined to be 

m 1 

MOiOO = 7 min ^(x) + (1 - 7 ) - I/ij(x) 
i-i m 


with 7 e [0, 1]. 

Definition 3 [Werners 1984] 

Let /ij(x) be membership functions of fuzzy sets to be aggregated in the sense of 
a fuzzy or (or). The membership function of the resulting fuzzy set is then 
defined as 


m 1 m 

MSrOO = 7 -max /ij(x) + (1 - 7 ) - I>i( x ) 

1-1 m 

with 7 e [0, 1]. 

These two connectives are not inductive and associative, but they are 
commutative, idempotent, strictly monotonic increasing in each component, 
continuous, and compensatory [Werners 1984, p. 168]. These are certainly very 
useful and acceptable properties. 

If we use the aggregation operator from definition 2 in model (4), then 
the "equivalent model" is: 

1 m 

maximize A + (1 - 7) £ X l 

m i-i 

such that A + Aj < /^(x) i=l, m 

Dx < d 
A, Aj,x>0 
0 < /ij(x) ^ 1 

(26) 

If (A 0 , A°j, x°) is optimal solution of (26) then x° is a maximizing solution to (3). 
It is obvious that if /i;(x) are linear (26) is again a standard linear programming 
problem. 

So far, the reference model from which we have departed has always 
been the "standard LP". Depending on the type of operator chosen and the 
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operator used the "equivalent model" turns out to be either a linear or a 
nonlinear programming model. Obviously, other reference models can be 
chosen as reference models. This has already been done, for instance, for 
integer programming [Zimmermann, Pollatschek 1984; Ignizio et al. 1983], 
Fractional Programming [Luhandjula 1984], Nonlinear Programming [Sakawa 
et al. 1989], etc.. The interrelationships between stochastic and fuzzy 
programming and their possible integration have also been 
investigated. [Buckley 1990; Dubois 1986] 


3. FUZZY MATHEMATICAL PROGRAMMING WITH 
FUZZY PARAMETERS 


Already for the basic approach described in section 2 a unique 
formulation for the "equivalent model", which eventually has to be solved, did 
not exist - the diversity of algorithmic approaches is even larger if other types of 
fuzzification of elements of mathematical programming models are considered. 
To demonstrate the basic idea behind most of the approaches we shall describe 
an easy to understand suggestion. For more general models the reader has to be 
referred to the literature. 

Ramik and Rimdnek [1985] consider the problem 


maximize f(x) 

such that a^ Xj ©aj 2 x 2 © ... ©a^ x n < bj, i= 1,..., m 
xj>0,j=,...,n 


(27) 


The a f j and the bj are supposed to be fuzzy numbers in L-R- 
representation. © denotes the extended addition. They show that for two fuzzy 
L-R numbers a = (m, n, a, /5) L . R and b = (p, q, 7, S) L . R a < b holds iff the 
following 4 inequalities hold: 

£ l( 7-«)^P ~ m 5 L (7-a)<p-m 


£ r (£ - 6) < q - n 6 R (j3- 6)<q-n 


( 28 ) 


where c R = sup {u; R(u) = R(0) = 1 }, 
5 R = inf {n; R(n) = lim R(s)} 

S -♦ 00 

and e L , S L correspondingly for L. 
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For "symmetric" fuzzy numbers a = (m, m, a, a) L . L as shown in fig. 1 system 
(28) reduces to 

e L |a- 7 |<p-m 8 2 | a: — | <p-m 

(29) 



FIG. 1: Fuzzy triangular number a = (m, m, a, 

On the basis of a lemma that they proof in their paper: 

^it x t © ••• © ® in *n =(Zm ij X:, Zny X:, ZayXj.ZByX:). 

j J (30) 

Hence, the constraints of (27) can be written as 

n n 

e L ( ^ a ij x j“7i) —Pi"" £ m ij x p 

j-i J J ' j-i J 

n n 

-S L ( .^“ijXj-^-Pi- .5 m ij x j- 
£ r ( 6i)<qi- In ijX j, 

Sr ( 5i)<qi- in ijX j, 

j-i i-i 


(31) 
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(31) is a system of crisp linear inequalities, which - together with the crisp 
objective function - can now be solved with any classical LP-method. Not 
counting the nonnegativity constraints, the number of rows in (31) is, however, 
four times as large as that of (27). It should also be noted that (28) is a specific 
interpretation of the fuzzy inequality relation. The authors offer two other 
interpretations, which lead to slightly different results. 

Example 2 [Ramfk, Rlmdnek 1985] 

Consider the following linear programming problem with fuzzy constraints. 
Maximize z = 5x x + 4x 2 


subject to 


(4, 4, 2, 1) l . lXi © (5, 5, 3, 1) l . l x 2 <(24, 24, 5, 8) L . L , 
(4, 4, 1, 2) l _ 1 x 1 © (1, 1, 0.5, 1) l . lX2 ^12, 12, 6, 3) L . L , 


Xl , x 2 > 0, 


(32) 


with the function L: [0, +«> [ -► [o, 1] being defined by the formula L(u) = max 
{0, 1 - u} for u > 0. 

As e L = 0, 5 L = 1, applying formulae (31) the system (33) is 
equivalent to the system of ordinary inequalities 


4x x + 5x 2 < 24, 

4x l + x 2 < 12, 

2xj + 2x 2 < 19, 

3x l + 0.5x 2 < 6, 

5x l + 6x 2 < 32, 

6xj + 2x 2 < 15, 

x x , x 2 > 0 (33) 

In this way, the problem has been transformed to a classical linear 
programming problem with the optimal solution and the corresponding value 
of the ojective function being 

Xj = 1.5, x 2 = 3, z = 19.5 (34) 

With respect to section 2 complementary approaches to the one 
described above are Tanaka et al. [1984, 1985]. There similar assumptions 
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concerning the fuzzy sets are made, but the objective function(s) and the 
nonnegativity constraints are also fuzzified. Also Rommelfanger [1989] goes 
into this direction. 

More general treatments of this problem can be found in Delgado et 
al. [1989], Dubois [1987], Orlovski [1989] and others. 


4. APPLICATIONS 

Fuzzy Mathematical Programming has been applied to other areas of 
theoretical investigations as well as to practical applications. 


4A METHODOLOGICAL APPLICATIONS 

Due to the "symmetry" of the majority of the models in FMP the number of 
objective functions does not matter. In classical mathematical programming, 
however, normally only one objective function, which generates the order over 
the solution space, could be accepted. If there are more than one objective 
function, multi objective decision making models or "vectorial optimization" 
models have to be applied, which normally require a much higher 
computational effort. It is, therefore, quite natural that FMP has been applied 
extensively to the area of multi criteria analysis. 

If the objective function can be calibrated by given aspiration levels 
either model (4) can be used directly or Tanaka et al. [1985] can be used for 
fuzzy parameters. If the objective functions cannot be calibrated naturally, 
model (19) can be used. This is even possible if the constraints are crisp rather 
than fuzzy. [Zimmermann 1978] 

Modern systems for multi objective decision making are generally 
interactive. FMP has also been applied to these decision making tools [Sakawa 
et al. 1990; Yano et al. 1989]. 

An interesting application of FMP is Campos’ contribution [1989] in 
which zero-sum matrix games with imprecise payoffs are considered. 

Somewhere between methodological applications and real applications 
(which have been installed and used in practice) are those appplications of 
FMP in which solutions to functional problems have been suggested but not 
(yet) really been implemented. Examples for this type of application are 
described in Wiedey, Zimmermann [1978], (Media Selection), Ernst [1982] 
(Logistics), Holtz, Desonki [1981] (Maintenance), Hintz, Zimmermann [1989] 
(Production Planning), Nickels [1990] (Cutting Stock Problem), etc.. 
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4B PRACTICAL APPLICATIONS 

Real applications of fuzzy mathematical programming are still pretty rare. This 
is certainly not due to weaknesses of FLP. Experience shows that people who 
have been using linear programming for quite a while have become so used to 
"cutting problems" to LP-models that they do not see the need to allow for 
uncertainty. The acceptance of FLP seems to be higher amongst people who 
have never used LP before and who are looking for tools to solve their 
problems properly. Another reason for not finding interesting applications in 
the literature is, of course, that good applications are not published for 
competitive reasons and failures are not published for other obvious reasons. 
FLP has been applied to blending problems with sensory constraints (such as 
the blending of chocolate stretch, champagne cuvde, paints etc.). The paint 
application was published [Zimmermann et al. 1986]. Another application was 
in logistics by Ernst [1982], which we will sketch in the following. He suggests a 
fuzzy model for the determination of time schedules for containerships, which 
can be solved by branch and bound, and a model for the scheduling of 
containers on containerships, which results eventually in an LP. We shall only 
consider the last model (a real project). 

The model contained in a realistic setting approximately 2,000 
constraints and originally 21,000 variables, which could then be reduced to 
approximately 500 variables. Thus it could be handled adequately on a modern 
computer. It is obvious, however, that a description of this model in a textbook 
would not be possible. We shall, therefore sketch the contents of the modeling 
verbally and then concentrate on the aspects that included fuzziness. 

The system is the core of a decision support system for the purpose of 
scheduling properly the inventory, movement, and availability of containers, 
especially empty containers, in and between 15 harbors. The containers were 
shipped according to known time schedules on approximately 10 big 
containerships worldwide on 40 routes. The demand for container space in the 
harbors was to a high extent stochastic. Thus the demand for empty containers 
in different harbors could either be satisfied by large inventories of empty 
containers in all harbors, causing high inventory costs, or they could be shipped 
from their locations to the locations where they were needed, causing high 
shipping costs and time delays. 

Thus the system tries to control optimally primarily the movements 
and inventories of empty containers, the capacities of the ships, and the 
predetermined time schedule of the ships. 

This problem was formulated as a large LP model. The objective 
function maximized profit (from shipping full containers) minus cost for 
moving empty containers minus inventory cost of empty containers. When 
comparing data of past periods with the model it turned out, that very often 
ships transported more containers than their specific maximum capacity. This, 
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after further investigations, lead to a fuzzification of the ship’s capacity 
constraints, which will be described in the next model. 

[Ernst 1982, p. 90] 

Let 

z = c T x the net profit to be maximized 

Bx < b the set of crisp constraints 

Ax ^ d the set of capacity constraints for which a crisp formulation 

turned out to be inappropriate 

Then the problem to be solved is: 


maximize 

z — c T x 


such that 

Axrsd 



Bx < b 



x>0 

(35) 


This corresponds to (14). Rather than using (18) to arrive at a crisp 
equivalent LP model the following approach was used: Basing on (11) and (12) 
the following membership functions were defined for those constraints that 
were fuzzy: 

OrStjrSpi-dj, iel, 

Pi °i 

I = Index set of fuzzy constraints. 

As the equivalent crisp model to (35) the following LP was used: 

maximize z’= c T - £ Sj(pj - bj)/^ (q) 


such that Ax < d + 1 
Bx^b 
t < p - b 

x, t S: 0 (36) 

where the Sj are problem-dependent scaling factors with penalty character. 

Formulation (36) only makes sense if problem-dependent penalty 
terms s i( which also have the required scaling property, can be found and 
justified. 

In this case the following definitions performed successfully: First the 
crisp constraints Bx<b were replaced by Bx<.9b, providing a 10% leeway of 
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capacity, which was desirable for reasons of safety. Then "tolerance" variables t 
were introduced: 


Bx - 1 < .9b 
t < .lb 

The objective function became 


maximize z = c l x - s l t 


s was defined to be 

_ average profit of shipping a full container 

average number of time periods which elapsed 
between departure and arrival of a container 


By the use of this definition more than 90% of the capacity of the ships 
was used only if and when very profitable full containers were available for 
shipping at the ports, a policy that seemed to be very desirable to the decision 
makers. 


5. CONCLUSIONS 

Mathematical programming is one of the areas to which fuzzy set theory has 
been applied extensively. Even if one considers the area of linear programming 
only, numerous new models - linear and nonlinear - have emerged through the 
application of fuzzy set theory. A good part of the models are of primarily 
theoretical interest. Still even from an application point of view, fuzzy 
mathematical programming is a valuable extension of traditional crisp 
optimization models. It is surprising that some areas, such as duality theory, 
have not yet drawn more interest. There further developments can still be 
expected. 
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INTRODUCTION 

Computer vision is the study of theories and algorithms involving the 
sensing and transmission of images; preprocessing of digital images for noise 
removal, smoothing, or sharpening of contrast; segmentation of images to isolate 
objects and regions; description and recognition of the segmented regions; and 
finally interpretation of the scene. We normally think of images in the visible 
spectrum, either monochrome or color, but in fact, images can be produced by 
a wide range of sensing modalities including X-rays, neutrons, ultrasound, 
pressure sensing, laser range finding, infrared, and ultraviolet, to name a few. 

Uncertainty abounds in every phase of computer vision. Some of the 
sources of this uncertainty include: additive and non-additive noise of various 
sorts and distributions in the sensing and transmission processes, questions which 
are often ill-posed, vagueness in class definitions, imprecisions in computations, 
ambiguity of representations, and general problems in the interpretation of 
complex scenes. The use of multiple modalities is receiving increased attention 
as a means of overcoming some of the limitations imposed by a single image, but 
the use of more than one source of information has caused new uncertainties to 
surface: how should the complementary and supplementary information be 
combined?, how should redundant information be treated?, how should conflicts 
be resolved?, etc. 

Traditionally, probability theory was the primary mathematical model 
used to deal with uncertainty problems in computer vision. More recently, both 
Dempster-Shafer belief theory and fuzzy set theory have gained popularity in 
modeling and propagating uncertainty in imaging applications. While both 
probability theory and belief theory are important frameworks for this field, the 
purpose of this paper is to explore the use of fuzzy set theory in computer vision. 
We will consider contributions of fuzzy sets to the image model, preprocessing, 
segmentation, object/region recognition, and reasoning aspects of the computer 
vision problem. Most of the examples given in this paper are those of the 
authors and we apologize a priori to those researchers whose work we will 
undoubtedly (though inadvertently) omit from the references. 
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LOW LEVEL IMAGE PROCESSING 

An image is a function/: Bp-ip? 11 where normally n is 2 or 3 and m is 
1 (intensity) or 3 (color). However, images can be constructed over numerous 
modalities, as well as over time, and so, the dimension of the range space can 
be quite large. A digital image is an image which has been discretized in both 
the domain and range spaces. This is commonly referred to as sampling and 
quantization respectively. In this paper we will restrict ourselves to two spatial 
coordinates, where each element P=(x,y) in the domain of the image is called 
a pixel. If m = 1, then the value f(x,y) is called the gray level of pixel (x,y); if 
m > 1, then f(x,y) is referred to a feature vector. 

The first connection of fuzzy set theory to computer vision was made by 
Prewitt [1] who suggested that the results of image segmentation should be fuzzy 
subsets rather than crisp subsets of the image plane. In order to apply the rich 
assortment of fuzzy set theoretic operators to an image, the gray levels (or 
feature values) must be converted to membership values. Let X denote the 
domain of the digital image. Then a fuzzy subset of X is a mapping /y: *[0,1], 

where the value of Fj(x,y) is dependant upon the original feature vector f(x,y). 
The calculation of membership functions is central to the application of fuzzy set 
theory, just as the calculation of conditional probability density functions or basic 
probability assignments are crucial in the use of probabilistic or Dempster-Shafer 
belief models. 


There are many methods of transforming pixel feature vectors into 
membership functions. In the case of gray scale images several authors have 
used 5-functions when there are only two regions (object and background) and 
combinations of 5-functions and 7r-functions for multiple regions (or suitable 
generalizations) [2-6]. These functions are defined by [7] 

f 0 z z a 
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a < z z b 

b < z £ c 
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where z is the gray level at pixel P. These functions are symmetric, but can be 
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easily made nonsymmetric by relaxing the requirement that b be the midpoint 
of a and c. Intuitively, these functions correspond to the statements "z is bright" 
and "z is approximately c", respectively. Pal and King [2-4] used S and n functions, 
along with approximations of them, as the basic building blocks for both contrast 
enhancement and smoothing. Following Nakagawa and Rosenfeld [8], they 
applied min and max operations on membership values in the neighborhood of 
each pixel to produce smoothing or edge detection. Other approaches to edge 
detection using fuzzy set methods can be found in [9,10]. 

One problem with this approach is that the parameters which define the 
membership functions must be supplied, primarily in an interactive fashion by 
the user. Pal and Rosenfeld [11], in a two class segmentation problem, 
automated this process by using several choices and picking the one which 
optimized a certain geometric criterion which we will describe later. Recently, 
we have used normalized histograms of the feature values generated from 
training data to estimate the particular membership functions [12-14]. This has 
the advantages that it does not force any particular shape to the resultant 
distributions, can be extended to deal with multiple features instead of gray level 
alone, and can easily accommodate the addition of new classes. 

Probably the most popular method of assigning multi-class membership 
values to pixels, for either segmentation or other processing, is to use the fuzzy 
c-means (FCM) algorithm [15,16]. Let R be the set of real numbers and be 
the d-dimensional vector space over the reals. Let X be a finite subset of R 
X - {xj, X2 - v x n }. In our case, each x^ is a feature vector for a pixel in the 
image. For an integer c, 2 < c < n, acxn matrix U = [u^] is called a fuzzy 
c-partition of X whenever the entries of U satisfy three constraints: 

c 

52 h# = 1 for all k 

i-l 

H 

u Uc > 0 for all i 

*= i 

Ufa € [0, 1] for all i, k. 

Column j of the c x n matrix U represents membership values ofxj in the c fuzzy 
subsets of X . Row i of U exhibits values of a membership function u ^ on X 
whereby denotes the grade of membership of x^ in the ith fuzzy 

subset of X. 

The FCM algorithm attempts to cluster feature vectors by searching for 
local minima of the following objective function: 
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J m ( U > *0 = E E »ifc II** - 1 ^ m < 00 


*=1 i«l 


where C/ is a fuzzy c-partition of X, 1*1^ is any inner product norm, V = {vj, 
v 2 ..., v c } is a set of cluster centers, VjER^, and mE(l,oo) is the membership 
weighting exponent. 


Cluster center v z * is regarded as a prototypical member of class /, and the 
norm measures the similarity (or dissimilarity) between the feature vectors and 
cluster centers. When m = 1, J m is the classical total within-group sum-of- 
squared error function; the u- define hard clusters in X and the v- are the 
centroids of the hard u z *. It is shown in [16] that for m > 1 under the assumption 
that Xfc # for all i, k, (U, V) may be a local minimum of J m only if 
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for all i. 


The algorithm defined by looping iteratively through the above 
conditions is known to generate sequences (or subsequences) that terminate at 
fixed points of J m . The FCM algorithm is comprised of the following steps 

BEGIN 

Set c, 2 < c < n 
Set €, 6 > 0 
Set m, 1 < m < oo 
Initialize LJ° 

Initialize j = 0 

DO UNTIL ( I ui-ui ' 1 1 < e ) 

Increment j 

Calculate {v7} using (2) and U 7'^ 
Compute L7 using (1) and {W} 

END DO UNTIL 

END 

The inner product norm || * \\^, or its replacement by more general distance 
metrics dr (x^ v-) (as will be used later) controls the final shape of the clusters 
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generated by the FCM: hyperspherical, hyperellipsoidal, linear subspace, etc. 

In terms of generating membership functions for later processing, the 
fuzzy c-means has several advantages. It is unsupervised, that is, it requires no 
initial set of tr ainin g data; it can be used with any number of features and any 
number of classes; and it distributes the membership values in a normalized 
fashion across the various classes based on "natural" groupings in feature space. 
However, being unsupervised, it is not possible to predict ahead of time what 
type of clusters will emerge from the fuzzy c-means from a perceptual 
standpoint. Also, the number of classes must be specified for the algorithm to 
run, although as will be seen in the next section, there are modifications which 
avoid this problem. Finally, iteratively clustering features for a 512 x 512 
resolution image can be quite time consuming. In [17, 18], approximations and 
simplifications were introduced to ease this computational burden. 

SEGMENTATION 

Image segmentation is one of the most critical components of the 
computer vision process. Errors made in this stage will impact all higher level 
activities. Therefore, methods which incorporate the uncertainty of object and 
region definition and the faithfulness of the features to represent various objects 
and regions are desirable. 

The process of segmentation has been defined by Horowitz and Pavlidis 
[19] as follows: Given a definition of uniformity, a segmentation is a partition 
of the picture into connected subsets, each of which is uniform, but such that no 
union of adjacent subsets is uniform. 

This definition is based on crisp set theory. The fuzzy c-partition 
introduced in the previous section can be defined as a fuzzy segmentation. 
Furthermore, if we define a uniformity predicate ) such that assigns the 
value true or false to the sample point Xj based on its 7 membership value (for 
example, = 1 if Pg > Mw for afi k), we will have paralleled crisp 

segmentation . J The fuzzy c-means has been successfully used as a segmentation 
approach by several researchers [17, 18, 20-22]. (We will see an example of this 
segmentation shortly). 

All of the methods for converting image feature values into class mem- 
bership numbers contain adjustable parameters: the cross-over point b for S and 
7T functions, the fuzzifier m in the c-means, etc. Varying these parameters affects 
the final fuzzy partition, and hence the ultimate crisp segmentation of the scene. 
Also, the number of classes desired impacts the resultant distributions, since the 
memberships are required to sum to one for a fuzzy c-partition. In some cases, 
these problems are not serious. For example, many segmentation problems 
involve separating an object from its background. Here the number of classes 
is obviously two. However, in general situations, the choice of these parameters 
must be carefully considered. 
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The basic approach which is taken to pick the number of classes and/or 
the function shaping parameters iteratively varies these parameters and picks the 
set of values which optimizes some measure of the final fuzzy partition. The 
optimization criteria can be based on the geometry of the fuzzy subsets of the 
image or on properties of the clusters in feature space. 

In a series of papers [23-25], Rosenfeld studied the geometry and 
topology of fuzzy subsets of the digital (i.e., image) plane. These properties were 
later generalized by Dubois and Jaulent [26, 27]. Many of the basic geometric 
properties of, and relationships among, regions can be generalized to fuzzy 
subsets. Rosenfeld has extended the theory of these fuzzy subsets to include the 
topological concepts of connectedness, adjacency and surroundedness, extent and 
diameter, and convexity. Rosenfeld et al. have also developed geometrical 
operations on fuzzy image subsets, including shrinking and expanding, and 
thinning. [28, 29] 

Of the above-mentioned geometrical properties, we discuss here only the 
connectedness, area, perimeter, and compactness of a fuzzy image subset, 
characterized by a membership function array /Xy(jc^-). In defining the above 
mentioned parameters we replace by /x for simplicity. 

A neighbor can be defined in several ways. In two-dimensional images, 
point P = Qc,y) of a digital image has two horizontal and two vertical neighbors, 
namely the points: (x-% y), (x, y-1), (x+% y), and (x, y+ 1). The four points are 
called the 4-connected neighbors of P, and we say that they are 4-adjacent to P. 
Similarly, P has four additional neighbors: (x-\ y-1), (x-% y+ 1), (x+% y-1), and 
(x+% y+ 1). We call these eight points the 8-connected neighbors of P (8- 
adjacent to P) [30]. 

The definition of connectedness for the crisp case as defined by 
Rosenfeld [30] is as follows: Let P, Q be two points of an image. A path p of 
length it from P to Q in an image is a sequence of points P = Pj, P2 > P n = 
Q such that Pj is a neighbor of P-_j, 1 <i<n. There are two versions of path p 
(a 4-path or an 8-path) depending on whether ''neighbor" means "4-neighbor" or "8- 
neighbor". If P and Q are points of an image subset 5, we say that P is 4-(8-) 
connected to Q in S if there exists a 4- (or 8-) path from P to Q. For any P in 
5, the set of points which are connected to P in S is called a connected 
component of S. 


For the fuzzy case, let /x be a mapping from X into [0,1], that is, let /x 
be a fuzzy subset of X. Let P,QGX. Then the degree of connectedness of P 
and Q with respect to /x is 


C,(P,<?) = 



where the operator max is taken over all paths ppQ (either 4- or 8-path 
connected) from P to Q, and the operator min is taken over all points R on the 
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path. P and Q are said to be connected in n if 

C/P.Q) * min (p(P), n«?)). 

The fuzzy set fi is said to be connected if every pair of points P,Q is connected 
in /i. 


The area of /x is defined as 

n(p) = Jp 

where the integral is taken over the whole image set, or for the digital case, 

M N 

= EE Vmr 

m n 

Let us call a fuzzy subset /z of S "piecewise constant" if there exists a 
segmentation E = {Sj, ..., S n } ofS such that /z has a constant value /z • on each 
Sj and fJL = 0 on S n (i.e., fi n = 0). Here, S n is considered the boundary of the 
image. If /z is piecewise constant (for example, in a digital image) a(/z) is the 
weighted sum of the areas of the regions on which /z has constant values, where 
the areas of the regions are weighted by these values. 

For the piecewise constant case, the perimeter of /z is defined as 

p(n) = EE l^r^l \ A ijk\- 

ij k 

This is just the weighted sum of the lengths of the aicsA-^ along which 
the /- th and ;-th regions having constant /z values and fij respectively meet, 
weighted by the absolute difference of these values. 

Considering the 4- adjacent definition for connectedness, the above 
equation for /?(/z) reduces to: 

M AM N A/-1 

p(.v) = £ E + E E 

m=l n=l i»*l m=l 

The compactness of /z is then defined as 

compM = 

P (P) 

For crisp sets, the compactness is largest for a disk, where it is equal to 
1/4 tt. For a fuzzy disk where /i depends only on the distance from the origin 
(center), it can be shown that 

*(p) * _L 
p 2 (n) 4 * ’ 

In other words, of all possible fuzzy disks, the compactness is smallest for the 
crisp version. 
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Therefore, one approach to finding an optimal partition is to generate 
many candidate partitions by varying the membership generation parameters and 
then choosing the partition which minimizes the fuzzy compactness of the result. 
Pal and Rosenfeld used this technique in a two class problem (object and back- 
ground) to find the best choice of S function shaping parameters [11]. In [31] 
Liao extended this approach to the case of several features and several classes 
by using the fuzzy c-means. Here the fuzzifier m was the variable. Figure la 
shows a 256 x 256 forward looking infrared image of a natural scene containing 
trees, grass areas and two vehicles. Because of the noisy nature of infrared 
images, this picture was smoothed using local averaging (Figure lb). The 
number of classes was fixed at 4 and the fuzzifier m varied from 1.2 to 5.0. For 
each choice of m, the sum of the compactness values of the resultant four fuzzy 
subsets was computed and the value of m (m= 3.0) giving minimal overall 
compactness was chosen. The result of the closest crisp partition segmentation 
is shown in Figure lc. Note that there are still many small noise components in 
the segmentation. By smoothing the membership matrix, giving higher weight 
to the vehicle class, and performing a noise cleaning operation (shrink-and- 
expand) [30], the excellent segmentation shown in Figure Id was obtained. The 
important point is that the initial segmentation formed a fuzzy c-partition of the 
image, and so, post-processing on the fuzzy subsets of the image was possible. 


The a priori setting of the number of classes is not always possible, espe- 
cially in segmentation of natural scenes. In such cases an algorithm called the 
Unsupervised Fuzzy Partition-Optimum Number of Clusters (UFP-ONC) algo- 
rithm [32] may be used. The UFP-ONC algorithm is derived from a combination 
of the fuzzy c-means algorithm and the fuzzy maximum likelihood estimation 
(FMLE). It attempts to obtain a satisfactory solution to the problem of large 
variability in cluster shapes and densities, and to the problem of unsupervised 
tracking of classification prototypes. There are no initial conditions on the 
location of cluster centroids, and classification prototypes are identified during 
a process of unsupervised learning [32]. The algorithm is essentially the same 
as the FCM algorithm described in the previous section, except that the distance 
measure defined by 

d \ x k> v i) = -^- ex p{|( x *- v i) r F i l (**-”<)}> 


is used instead of the inner product norm. In (3) F- is the fuzzy covariance 
matrix of cluster / given by 


F| . ^ 
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and Pj is the a priori probability of the /- th cluster defined by 
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where N is the total number of feature vectors. In addition to updating the 
cluster prototypes using (2) and the memberships using (3), the covariance 
matrices Fj are also updated in every iteration. After the algorithm converges, 
certain performance measures are computed for the resulting fuzzy partition. 
This process is repeated for increasing number of clusters in the data set 
computing performance measures in each run, until a partition into an optimal 
number of subgroups is obtained. 

The performance criteria in this algorithm is the minimization of the 
overall hypervolume of the clusters as calculated from the determinants of the 
fuzzy covariance matrices. Figure 2a shows an intensity image containing trees, 
roads, and sky regions. A set of new local features, based on fractal geometry 
was generated from this image [33]. Figure 2b shows the resulting segmentation 
when these fractal features were used with the UFP-ONC algorithm. An 
another example, Figure 2c shows the original range image of a block. Figure 
2d shows the segmentation obtained when the mean curvature and another new 
differential geometric feature [34] were used as the input to the UFP-ONC 
algorithm. The clusters corresponding to the minimum total hypervolume in the 
feature space are shown in Figure 2e. As can be seen the UFP-ONC algorithm 
is effective in locating the ellipsoidal clusters of various sizes and orientations. 

A different approach to both segmentation and object recognition is 
taken by Krishnapuram and Lee [35, 36] and Keller, et al. [5, 37, 38]. The two 
different techniques share the common idea that class labeling for segmentation 
or object labeling for recognition should be viewed as an aggregation of evidence 
problem. The evidence can be derived from several sensors (for example, color), 
several distinct pattern recognition algorithms, different features, or the 
combination of image data with non-image information (intelligence). The 
advantages of multi-sensor fusion he in redundancy, complementarity, timeliness 
and low cost of the information. The support for a decision may depend on 
supports for (or degrees of satisfaction of) several different criteria, and the 
degree of satisfaction of each criterion may in turn depend on degrees of 
satisfaction of other sub-criteria, and so on. Thus, the decision process can be 
viewed as a hierarchical network, where each node in the network "aggregates" the 
degree of satisfaction of a particular criterion from the observed evidence. The 
inputs to each node are the degrees of satisfaction of each of the sub-criteria, 
and the output is the aggregated degree of satisfaction of the criterion. 

For image segmentation as discussed in [35, 36], the decision making 
problem reduces to i) determining the structure of the aggregation network to 
be used, ii) determining the nature of the connectives at each node of the 
network, and iii) computing the input supports (degrees of satisfaction of 
criteria) based on observed features. 

The structure of the aggregation network depends on the problem at 
hand [39]. The connectives used at each node of the network are based on fuzzy 
union, fuzzy intersection, or compensative operators (such as generalized mean 
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Figure 2. Fuzzy segmentation finding the optimum number of classes. 

(a) intensity image of natural scence and (b) range image of 
block; (c) & (d) optimal partition of top tow using UFP-ONC 
algorithm; (e) feature space clusters for the block image. 
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or the 7 -model) [39]. The innovative aspect of this work is that a backpropaga- 
tion algorithm (and convergence theory) was developed so that both the type of 
connective at each node, as well as the parameters associated with the connective 
can be learned from training data [35, 36]. 

As an example, consider the fusion of information from different modal- 
ities for segmentation of outdoor scenes. In particular, the modalities considered 
are color images of size 256 x 256 (obtained from the University of Massachu- 
setts) with intensity components r, g and b (red, green and blue). In an initial 
experiment, to keep the problem tractable, the following features were used as 
criteria: intensity value (r+g+b/ 3), blue-red difference fe-r, excess green %-r-b, 
and position (row number). The first three features correspond to the Ohta color 
space and were chosen because they have been found to correspond to 
meaningful colors and they have also been found to be effective for color image 
segmentation [40]. The position of a pixel is important for labels such as sky and 
road. The image was first median filtered, and the feature images were 
normalized so that all the values fall between 0 and 255. The following six labels 
were considered: sky, tree, roof, walls, grass, and road. In the example, we used 
one layer aggregation networks based on the generalized mean to determine the 
parameters of the network. About 60 training samples were taken from different 
parts of the image for each class. Membership values were calculated using the 
feature histograms of the training data. Since the histogram is very jagged, it 
had to be smoothed by a window of length 11 before normalizing it. After 
training, the network was used for segmentation of the image by assigning each 
pixel to the class which had the highest degree of satisfaction generated from the 
pixel's features. 

Figure 3a shows the original intensity image and Figure 3b shows the 
segmented and labeled image when the 7 -model was used as the aggregation 
operator. The labels in increasing order of grey level are: road, tree, wall, roof, 
grass, and sky. The results are excellent, considering the small number of 
features used and the simplicity of the network employed. Note that most of the 
misclassifications occur at areas where the true label is not any of the six labels 
considered. This segmented image was improved by a shrink-and-expand 
operator and this image is shown in Figure 3c. An important point here is that 
this method not only partitions the image into connected components of similar 
properties, but also labels these components. In other words, it produces both 
a segmentation and a region recognition simultaneously, while capturing an 
abstract model of the decision making process. 

The fuzzy integral has also been used to fuse both objective information 
from features and (possibly subjective) information on the importance of subsets 
of features for segmentation in [5, 37]. This approach will be described in the 
section on object and region recognition. 
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Figure 3. Multispectral segmentation by hierarchical fuzzy aggregations. 

(a) Intensity image of natural scene; (b) Six class segmentation 
and labeling; (c) Figure 3b cleaned up by shrink-and-expand 
operator. 
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BOUNDARY DETECTION 

Boundary detection is another approach to segmentation. In this 
approach, an edge operator is first used on the image to detect edge elements. 
The edge elements so detected are considered to be part of the boundaries 
between various objects or regions in the image. The boundaries are sometimes 
described in terms of analytical curves such as straight lines, circles, and other 
higher degree curves. 

The FCM algorithm can be used to detect (or fit) straight lines to edge 
elements. This is achieved by initializing the FCM with c linear prototypes 
rather than c centers. Each linear prototype consists of a point (which acts as 
cluster center) and a parameter defining the orientation of the cluster. The 
fuzzy covariance matrix F- of each cluster (as defined in (4)) may be used to 
define its orientation since its principal eigenvector gives the direction of 
maximum variance of the cluster. The c prototypes are updated in each iteration 
as described in the previous section except that in each iteration the covariance 
matrix of each cluster is also updated. Several distance measures may be used 
for the detection of lines. One of them is defined by 

d\x v v { ) = *pl + (d ik f 

where D-^ is the distance of the point from the line and is the Enclidean 
distance between and v- v oe ( - is chosen as 1-(A where A ^ and ^ are 
the smaller and larger eigenvalues of cluster / [41]. We have shown that the 
scaled Mahalanobis distance given by 

4 = 1^ I 1/2 (*, - v/F,’ 1 (x, - v t ) (6) 

is also very effective for the detection of lines or linear clusters. [42]. In (6) Fj 
is the fuzzy covariance matrix of cluster i as defined in (4). As mentioned 
earlier, one problem with the FCM is that the number of clusters needs to be 
specified. In the line detection case, one way to overcome this is to specify a 
relatively high value of c and then merge compatible clusters after the algorithm 
converges [42]. Figure 4 shows an example of this method. Figure 4a shows the 
original image. This image is equivalent to the threshold output of an edge 
operator (such as the Sobel operator) on an intensity image of the characters 
UMC. Figure 4b shows the clustering when c was specified to be 14. Note that 
the leading stroke of both the U and the M are split into two subclusters (in 
some examples the initial cluster organization is much worse). Figure 4c shows 
the clustering after compatible clusters are merged. The final optimal number 
of clusters was determined to be 10, which is correct in this case. In this 
implementation, two (or more) clusters were considered compatible if i) their 
orientation was the same, ii) the line joining their centers had the same orien- 
tation as the clusters and iii) the cluster centers were not more than 4 principal 
eigenvalues apart. The lines so found by the algorithm can then be used to 
describe large sections of the boundary or the linear substructures in the image. 
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Figure 4. 



Segmentation by boundary detection, (a) Simulated thresholded 
edge output; (b) Output of modified FCM to detect linear 
clusters (c = 14); (c) Optimal linear partitions by compatible 
cluster merge algorithm (c = 10). 
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The FCM algorithm with linear prototypes may be generalized to detect 
combinations of subspaces [43-44] and also non-linear clusters such as circles 
[45]. 


There are numerous techniques for incorporating fuzzy set theoretic 
operators into the segmentation process of which we have only highlighted a few. 
It is our belief that the benefits of producing fuzzy subsets of the image will 
encourage more research into the utilization of fuzzy approaches to this crucial 
aspect of computer vision. 

OBJECT/REGION RECOGNITION AND HIGH LEVEL VISION 

The area of computer vision concerned with assigning meaningful labels 
to regions in an image can be thought of as a subset of pattern recognition. 
There is a large amount of research in the use of fuzzy set theory in pattern 
recognition, but here we will only discuss a few approaches for object recognition 
in image analysis. 

As was seen in the previous section, the fuzzy-connective-based hierarch- 
ical aggregation networks not only segmented an image, but also provided class 
labels for each pixel based on local feature evidence and training information. 
Normally, once segmentation has been completed, features are computed for the 
entire region and this data is used to classify the areas found. The aggregation 
networks can function well in this setting also. The reader is referred to [13] for 
several examples of object recognition using fuzzy aggregation networks. 

The fuzzy integral is another numeric-based approach which we have 
used for both segmentation and object recognition [5,14,37,38]. It also uses a 
hierarchical network of evidence sources to arrive at a confidence value for a 
particular hypothesis or decision. The difference from the proceeding method 
is that besides this directly supplied objective evidence, the fuzzy integral utilizes 
information concerning the worth or importance of the sources in the decision 
making process. 

The fuzzy integral relies on the concept of a fuzzy measure which 
generalizes probability measure in that it does not require additivity, replacing 
it with a weaker continuity condition. A particularly useful set of fuzzy measures 
is due to Sugeno [46]. A fuzzy measure g ^ is called a Sugeno measure if it 
satisfies the following additional property: 

If ADB = theng^ (AUB) = g x (A) + g x (B) + \g x {A)g x {B), 

for some A>-1. 

Suppose X is a finite set, X = {xp ..., x n }, and let £ = g^({^}). Then the set 
{« > •••> fP) is called the fuzzy density function for g^. 

Using the above definitions one can easily show that g^ can be 
constructed from a fuzzy density function by 
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n(i + - 1 

8M) = , 

n 

for any subsets of X. Using the fact that x = U^}, A. can be determined 
from the above equation. 1,1 

Let h: X— >[0,1]. The fuzzy integral of h over X with respect to is 
defined in [46] by. 

jh(x)og x = sup [a A g x (F.)] 

«6[0,1] 

where F a = ix € X | h(x) * a). 

In our applications, the set X is the set of information sources (sensors, 
algorithms, features, etc.) and the function h supplies a confidence value for a 
particular hypothesis or class from the standpoint of each individual source of 
information. The fuzzy measure supplies the expected worth of each subset of 
sources from this hypothesis. 

If X= {xj y ..., x n } 9 is a finite set, arranged so that h(xj) z h(x 2 ) * ... * 
h(x n ), then 

f x h(x)og x = V [A&tj) A g A (Xp] 

where = {*;, ..., j^-}. Also, given A as calculated above, the values gxQ^i) can 
be determined recursively from the definitions [46]. The fuzzy integral is 
interpreted as an evaluation of object classes where the subjectivity is embedded 
in the fuzzy measure. In comparison with probability theory, the fuzzy integral 
corresponds to the concept of expectation. In general, fuzzy integrals are 
nonlinear functionals (although monotone) whereas ordinary (e.g., Lebesque) 
integrals are linear functionals. 

As an example, the fuzzy integral algorithm was tested using forward 
looking infrared (FLIR) images containing two tanks and an armored personnel 
carrier (APC) [38]. There were three sequences of 100 frames each used for 
training purposes. In each sequence, the vehicles appeared at a different aspect 
angle to the sensor (0 °, 45°, 90°). In the fourth sequence the APC "circled" one 
of the tanks, moving in and out of a ravine and finally coming toward to sensor. 
This sequence was used to perform the comparison tests. The images were 
preprocessed to extract object of interest windows. The classification level 
integration was performed using four statistical features calculated from the 
windows. To get the partial evaluation, h(x), for each feature, the fuzzy two- 
mean algorithm [16] was used. The fuzzy densities, the degree of importance of 
each feature, were assigned based on how well these features separated the two 
classes Tank and APC on training data [38]. The result of the fuzzy integral 
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classifier is presented in the form of confusion matrix, in Table 1, where the 
count of samples listed in each row are those which belong to the corresponding 
class and the count of samples listed in each column are those after classifica- 
tion, which was made by choosing the class with the largest integral value. 

The fuzzy integral outperformed a simple Bayes classifier on this data, 
but more importantly, the final integral values provide a different measure of 
certainty in the classification than posterior probabilities. The integral evaluation 
need not sum to one, so that lack of evidence and negative evidence can be 
distinguished. 

This approach was also compared to a Dempster-Shafer rule-based 
classifier [47]. A conceptual difference between the fuzzy integral and a 
Dempster-Shafer classifier is in the frame of discernment [48]. For the fuzzy 
integral the frame of discernment contains the knowledge sources related to the 
hypothesis under consideration, whereas with belief theory, the frame of discern- 
ment contains all of the possible hypotheses. Thus the fuzzy integral algorithm 
has a means to assess the importance of all groups of knowledge sources towards 
answering the questions as well as the degree to which each knowledge source 
supports the hypothesis. With belief theory, each knowledge source would have 
to generate a belief function over the power set of the set of hypotheses, which 
are then combined using Dempster's rule. This calculation can have exponential 
complexity with the number of hypotheses. With the fuzzy integral, the Sugeno 
measure need only be calculated for n subsets (where n is the number of know- 
ledge sources for each hypothesis). These measures are then combined with the 
objective evidence to produce the integral values. 

TABLE 1. 


Fuzzy Integral Classifier for a Two Class ATR Problem 


Computed densities and A values 
g 1 g 2 

g 3 

g 4 

A 


0.23 


0.22 

0.760 

APC 0.15 

0.24 


0.23 

0.764 

Confusion Matrix 






Tank 

APC 

Tank 

175 


1 

APC 

17 

49 


Total correct 92.6% 




Recently, Tahani has extended this information fusion approach to a 
large family of ^-decomposable measures and generalized the definition of the 
fuzzy integral, thereby significantly increasing the flexibility of this powerful tool 
[14]. 
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The above techniques, as well as many other fuzzy pattern recognition 
algorithms, are numeric feature-based procedures. On the other hand, fuzzy 
logic, and in general possibility theory, is inherently set-based, and so, offers the 
potential to manipulate higher order concepts. For example, in [49] (and refined 
in [50]) Keller et al used linguistic weighted averaging of possibility distributions 
[51] to generate object confidence from a combination of feature level results 
and harder-to-quantify values relating to range and motion. Rough estimates of 
object range and motion were used to construct trapezoidal possibility 
distributions which were averaged, using alpha-level set methods [51], with 
similar trapezoidal numbers formed from the output of fuzzy pattern recognition 
algorithms such as the fuzzy fc-nearest-neighbors [52]. In [50] we developed a 
scaling technique to actually turn the averaging procedure into a confidence 
fusion methodology overcoming the spreading inherent in fuzzy arithmetic. 

In [53], normalized histograms of color components of images of beef 
steaks were used directly in a linguistic approximation scheme to assess the 
degree-of-doneness of the steak. It was felt that because of the large amount of 
uncertainty inherent in food processing, the entire distribution of color (primarily 
in the red and brown regions) was important for class recognition. Note that 
this is conceptually distinct from those techniques described earlier which used 
normalized histograms of training data to calculate membership numbers for 
particular instances of the domain variable. Here, the object (a steak image) is 
represented by a group of fuzzy sets (various color histograms) and a set-based 
nearest prototype algorithm was used to assign class labels and confidences. 

Rule-based systems have gained popularity in computer vision 
applications, particularly in high level vision activities. In guiding the choice of 
parameters for low level algorithms, a vision knowledge base may have a rule 
such as 


IF the range is LONG, THEN 

the prescreener window size is SMALL. 

If LONG and SMALL are modeled by possibility distributions over appropriate 
domains of discourse, then fuzzy logic offers numerous approaches to translate 
such rules and to make inferences from the rules and facts modeled similarly. 
Nafarieh and Keller [54] designed a fuzzy logic rule based system for automatic 
target recognition which contained the above rule and approximately 40 other 
such rules. 

Most fuzzy logic inference is based on Zadeh's composition rule. This 
generalizes traditional modus ponens which states that from the proposition 

P 2 : If JT is ^ Then Y is U 
and Pg X is A , 
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we can deduce Y is B. If proposition did not exactly match the antecedent 
of Pj, for example, X is A', then the modus ponens rule would not apply. 
However, in [55], Zadeh extended this rule if A, B, and A' are modeled by fuzzy 
sets. In this case, X and Y are fuzzy variables [55] defined over universes of 
discourse U and V respectively. As described above, the propositions XisA and 
y is B, where A and B are fuzzy subsets of U and V respectively, generate 
possibility distributions for the variables X and Y. The proposition Pj concerns 
the joint fuzzy variable (X,Y) and is characterized by a fuzzy set over the cross 
product space U x V. Specifically, Pj is characterized by a possibility 
distribution: 

IW> = R where 

M«>v) = max{(l -(!,(«)), ji,(v)} 

It should be noted that this formula corresponds to the statement "not 
A or B", the logical translation of Pj. Zadeh now makes the inference Y is B' 
from HR and by 

|i„/(v) = max{min {!!*(«, v), iv(«)}}- 

U 

While this formulation of fuzzy inference, called the composition rule, 
directly extends modus ponens, it suffers from some problems. In fact, if 
proposition P' is X is A, the resultant fuzzy set is not exactly the fuzzy set B. 

Besides changing the way in which Pj is translated into a possibility 
distribution, methods involving truth modification have been proposed. In this 
approach, the proposition X is A is compared with X is A, and the degree of 
compatibility is used to modify the membership function of B to get that for B\ 

A fuzzy truth value restriction r is a fuzzy subset of X = [0,1], and can 
be defined by its membership function, /i r which is a mapping 


» r :X-> [0,1]. 


For example, we can define fuzzy truth value restrictions true, very true, false, 
unknown, absolutely true, absolutely false, etc. 

In the truth value restriction methodology, the degree to which the 
actual given value A' of a variable X agrees with the antecedent value A in a 
proposition If JT is ^4 then Y is B is represented as a fuzzy subset of a truth 
space. This fuzzy subset of truth space is what is referred to by the phrase truth 
value restriction; it is used in a fuzzy deduction process to determine the 
corresponding restriction on the truth value of the proposition Yis B. This latter 
truth value restriction is then "inverted", which means that a fuzzy proposition Y 
is IT in the Y universe of dis-course is found such that its agreement with Y is 
B is equal to the truth value restriction derived by the aforementioned fuzzy 
inference process. That is Hft'iy) = /^(/^(v)). The rule-based system 
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described in [54] utilized a new inference technique based on truth value 
restriction which outperformed most methods of fuzzy logic inference when the 
inputs were exponentially defined functions of the antecedent clause (VERY 
LONG, MORE-OR-LESS LONG, etc.). 

To ease the computational burden of performing modus ponens 
inferences with fuzzy sets, and to preserve the generalization capability, we 
introduced neural network architectures to accomplish the fuzzy logic inferences. 
These architectures could be trained on multiple conjunctive or disjunctive 
antecedent clause rules and could actually store several compatible rules in one 
structure, providing a natural method of conflict resolution [56-58]. 

CONCLUSIONS 

The use of fuzzy set theory is growing in computer vision as it is in all 
intelligent processing. The representation capability is flexible and intuitively 
pleasing, the combination schemes are mathematically justifiable and can be 
tailored to the particular problem at hand from low level aggregation to high 
level inferencing, and the results of the algorithms are excellent, producing not 
only crisp decisions when necessary, but also corresponding degrees of support. 

There is much work left to be done at all levels of computer vision. 
One area of particular need is the calculation and subsequent use of (fuzzy) 
features from the output of fuzzy segmentation algorithms. More research is also 
necessary in high level vision processes. Fuzzy set theory offers excellent poten- 
tial for describing and manipulating object and region relationships, thereby as- 
sisting with scene interpretation. Finally, we believe that possibility distributions 
should be the model for the interface between (1) the human and the vision 
system and (2) high level vision subsystem and mid or low level vision processes. 

This paper represents a short survey of fuzzy set methods in computer 
vision. Once again we apologize to all whose work we have inadvertently 
omitted from review. We strongly believe in the potential of fuzzy set theory to 
solve increasingly difficult computer vision problems, and hope that this survey 
will increase research in this area. 
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INTRODUCTION 

An application of the theory of fuzzy subsets to image processing and scene 
analysis problems has been described here. The problems considered are 
(pre)processing of 2-dimensional image pattern, extraction of primitives, and 
recognition and interpretation of image. 

A gray tone picture possesses some ambiguity within the pixels due to the 
possible multivalued levels of brightness. The incertitude in an image may arise 
from grayness ambiguity or spatial (geometrical) ambiguity or both. Grayness 
ambiguity means "indefiniteness" in deciding a pixel as white or black. Spatial 
ambiguity refers to "indefiniteness" in shape and geometry of a region e.g., where is 
the boundary or edge of a region? or is this contour "sharp"? 

When the regions in a image are ill-defined (fuzzy), it is natural and also 
appropriate to avoid committing ourselves to a specific (hard) decision e.g., 
segmentation/thresholding and skeletonization by allowing the segments or skeletons 
or contours, to be fuzzy subsets of the image. Similarly, for describing and 
interpreting ill-defined structural information in a pattern (when the pattern in- 
determinary is due to inherent vagueness rather than randomness), it is natural to 
define primitives and relation among them using labels of fuzzy set For example, 
primitives may be defined in terms of arcs with varying grades of membership from 0 
to 1 and production rules of a grammar may be fuzzified to account for the fuzziness 
in physical relation among the primitives; thereby increasing the generative power of 
a grammar. 

The first part of the article consists of a definition of an image in the light of 
fuzzy set theory, and various information measures (arising from fuzziness) and tools 
relevant for processing e.g., fuzzy geometrical properties, correlation, bound 
functions and entropy measures. The second part provides formulation of various 
algorithms along with management of uncertainties (ambiguities) for image 
enhancement, edge detection, skeletonization, filtering, segmentation and object 
extraction. Ambiguity in evaluation and assessment of membership function has 

* Dr. Pal is on leave from the post of Professor in the Electronics and 

Communication Sciences Unit, Indian Statistical Institute, Calcutta 700035, 

India. 
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also been described here. The third part describes the way of extracting various fuzzy 
primitives in order to describe the contours of different object regions of an image. 
Finally the fuzzy grammars are used to demonstrate how syntactic algorithms can be 
formulated for identifying different region structures/ classes of patterns. The above 
features have been illustrated through examples and various image data. 

IMAGE DEFINITION 

An image X of size MxN and L levels can be considered as an array of fuzzy 
singletons, each having a value of membership denoting its degree of brightness 
relative to some brightness level l , I = 0, 1, 2, . . . L - 1. In the notation of fuzzy 
sets, we may therefore write 

X = {M-x( x mn) = M-mn/ x mn» ni = 1, 2 . . . M; n = 1, 2, . . . Nj ( 1 ) 

X = UU M-mn/ x mn , m = 1, 2, . . ., M; n = 1, 2, . . . N 
* mn 

where JJ-x( x nm) or M-mn/ x mn > (fi — M-mn - l) 

denotes the grade of possessing some property p^ (e.g., brightness, edginess, 
smoothness) by the (m,n)th pixel intensity x^. In other words, a fuzzy subset of 

an image X is a mapping p from X into [0, 1]. For any point p e X , p(p) is 
called the degree of membership of p in p. 

One may use either global or local information of an image in defining a 
membership function characterizing some property. For example, brightness or 
darkness property can be defined only in terms of gray value of a pixel x^ whereas, 
edginess, darkness or textural property need the neighborhood information of a pixel 
to define their membership functions. Similarly, positional or co-ordinate 
information is necessary, in addition to gray level and neighborhood information to 
characterize a dynamic property of an image. 

Again, the aforesaid information can be used in a number of ways (in their 
various functional forms), depending on individuals opinion and/or the problem to 
his hand, to define a requisite membership function for an image property. 

MEASURES OF FUZZINESS AND IMAGE 
INFORMATION 

The definitions of various measures which represent grayness ambiguity in an 
image (based on individual pixel as well as a collection of pixels) are listed below. 

Linear Index of Fuzziness 

Yl (X) = (2/MN)IIIp mn -p mn l 

m n ** 

— (2/MN)X X M-rrrn) 

m n 

m = 1, 2, . . . M; n = 1, 2, . . . N 


( 2 ) 
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Quadratic Index of Fuzziness 

r i 0 - 5 

Yq(X) = (2/VMN) Xl{p mn M-mn } 

Lm n 

m = 1, 2, . . . M; n = 1, 2, . . . N 
Entropy 

H(X) = (1/MN In 2 ) 1 X 81 ^™) 
m n 

with S n (Hnm) = " M-mn kl Minn (l Mmn ) kl(l " M-mn) 

m = 1, 2, . . . M; n = 1, 2, . . . N 
^mn ^ enotes degree of possessing some property p by the (m, n)th pixel 

x mn . Mmn denotes the nearest two tone version of p 
" r mn 


rth Order Entropy 

H r (X) = (- l/k)x{p(s? )log{p(s? )}+ |l - p(s? )| logjl - p(s? )}| (5) 

i = 1, 2, . . . k 

s[ denotes the ith combination (sequence) of r pixels in X. k is the number of such 

sequences. p(s[) denotes the degree to which the combination s[, as a whole, 
possesses the property p. 

Hybrid Entropy 

^hy(X) = — f*b k>g E b (6) 

•-u E w = (l/MN) XX Pmn ex P(l — Mmn ) 
wun m n 

E b =(1/MN) Il(l-Mmn)e x p(Mmn) 
m n 

m = 1, 2, . . . M; n = 1, 2, . . . N 
p^ denotes the degree of "whiteness" of (m, n)th pixel. P w and denote 

probability of occurrences of white (P mn = 1) and black (p = 0) pixels respectively. 
E w and denote the average likeliness (possibility) of interpreting a pixel as white 
and black respectively. 

Correlation 

C(P1, P2) = l-4 Il{pimn-P2nm} 

_m n 
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with 


and 


C(l4, 112 )= 1 ifX 1 +X 2 = 0 

Xj= Sl{2m mn -1} 2 

m n 

x 2 = Sl{2H2nm-l} 2 

m n 


m = 1, 2, . . . M; n = 1, 2, . . . N 

C(|ip ^2 ) denotes the correlation between two properties (lj and (defined over 
the same domain). p lmn and p 2mn denote the degree of possessing the properties 
and P 2 respectively by the (m, n)th pixel. 

These expressions (equations 2-7) are the versions extended to two dimensional 
image plane from those defined for a fuzzy set For example, index of fuzziness was 
defined by Kaufmann [1], entropy by DeLuca and Termini [2], rth order entropy and 
hybrid entropy by Pal and Pal [3], and correlation by Murthy, Pal and Dutta 
Majumdar [4]. 

Index of fuzziness reflects the ambiguity present in an image by measuring the 
distance between its fuzzy property plane and the nearest ordinary plane. The term 
"entropy", on the other hand, uses Shannon's function in the property plane but its 
meaning is quite different from the one of classical entropy because no probabilistic 

concept is needed to define it. H r (X) gives a measure of the average amount of 
difficulty in taking a decision on any subset of size r with respect to an image 


property. If r = 1, H r (X) reduces to (unnormalized) H(X) of equation (4). H^y (X) 

represents an amount of difficulty in deciding whether a pixel possesses certain 
properties or not by making a prevision on its probability of occurrence. In absence 
of fuzziness (i.e.,with proper defuzzification), reduces to two state classical 

entropy of Shannon, the states being black and white. Since a fuzzy set is a 
generalized version of an ordinary set, the entropy of a fuzzy set deserves to be a 
generalized version of classical entropy by taking into account not only the fuzziness 

of the set but also the underlying probability structure. In that respect, H^y can be 
regarded as a generalized entropy such that classical entropy becomes its special case 
when fuzziness is properly removed. 

All these terms, which give an idea of 'indefiniteness' or fuzziness of an image 
may be regarded as the measures of average intrinsic information which is received 
when one has to make a decision (as in pattern analysis) in order to classify the 
ensembles of patterns described by a fuzzy set. 


Y(x) and H(X) are normalized in the interval [0, 1] such that 
P* 1* Ymin = Hmin = ® M-mn = 0 for all (m,n)(X) 
Pr 2; Ymax = Umax = 1 for |tmn = 0.5 for all (m,n) 


Pr3: y(X) > y(X *)(or, H(X) > H(X *)) 


(8a) 

(8b) 

(8c) 
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and Pr4: y(X) = y(x)(or,H(X)>H(x)) (8d) 

where X* is the 'sharpened' or 'intensified' version of X such that 
M-x *( x mn)-M-x( x mn) M-x( x mn) - 0*5 
and M-x*( x mn)- M-x( x mn) ^ M-x( x nm) — ^-5 ( 9 ) 

In other words, Y(x) or H(X) increases monotonically with p, reaches a 
maximum at p = 0.5 and then decreases monotonically. This is explained in Fig. 1. 



Figure 1 Variation of Fuzziness with p. 

According to property 8(c), these parameters decrease with contrast enhancement 
of an image. Now through processing, if we can partially remove the uncertainty on 
the grey levels of X, we say that we have obtained an average amount of information 

given by Sy = y(X) - y(X *) or 5H = H(X) - H(X *) by taking a decision bright or 
dark on the pixels of X. The criteria y(X *) < y(X) and H(X *) < H(X) , in order to 
have positive 8y and 8H -values, follow from Eq. (8c). If the uncertainty is 

completely removed, then y(X *) = H(X *) = 0. In other words, Y(x) and H(X) can 
be regarded as measures of the average amount of information (about the grey levels 
of pixels) which has been lost by transforming the classical pattern (two-tone) into a 
fuzzy pattern X. 

It is to be noted that Y(x) or H(X) reduces to zero as long as p is made 0 or 1 

for all (m, n), no matter whether the resulting defuzzification (or transforming 
process) is correct or not. In the following discussion it will be clear how H^, 

takes care of this situation. 

H r (X) has the following properties: 

Pr 1: H 1 " attains a maximum if p^ = 0.5 for all i. 

Pr 2: H r attains a minimum if p- = 0 or 1 for all i. 



152 


Pr 3: H r > H* r , where H* r is the rth order entropy of a sharpened version 
of the fuzzy set. 

Pr 4: H r is, in general, not equal to H r , where H r is the rth order entropy 
of the complement set. 

Pr 5 . H r £ H r+1 when all m e [0.5, 1] 

H r > H r+1 when aU pj e [0, 0.5]. 

Note that the property P4 of equation 8(d) is not, in general, valid here. The 
additional property Pr 5 implies that H r is a monotonically nonincreasing function 
of r for Pi e [0, 0. 5] and a monotonically nondecreasing function of r for 

Pi g [0.5, 1] (when 'min’ operator has been used to get the group membership 
value). 

When all the Pj values are same, H 1 (X)= H 2 (X)= . . . =H r (X). This is 

because of the fact that the difficulty in taking a decision regarding possession of a 
property on an individual is same as that of a group selected therefrom. The value of 

H r would, of course, be dependent on the p. values. 

Again, the higher the similarity among singletons the quicker is the convergence 
to the limiting value of H r . Based on this observation, let us define an index of 
similarity of supports of a fuzzy set as S = H^/H 2 (when H 2 = 0, H* is also zero and 

S is taken as 1). Obviously, when Pi g [0.5, 1] and the min operator is used to 
assign the degree of possession of the property by a collection of supports, S will lie 
in [0, 1] as H r < H r+1 . Similarly, when Pi g [0, 0.5] S may be defined as 
so that S lies in [0,1]. Higher the value of S the more alike (similar) are the 
supports of the fuzzy set with respect to the property P. This index of similarity can 
therefore be regarded as a measure of the degree to which the members of a fuzzy set 
are alike. 

Therefore, the value of conventional fuzzy entropy (H* or Eq. 4) can only 
indicate whether the fuzziness in a set is low or high. In addition to this, the value 
of H r also enables one to infer whether the fuzzy set contains similar supports (or 
elements) or not. The similarity index thus defined can be successfully used for 
measuring interclass and intraclass ambiguity (i.e., class homogeneity and contrast) 
in pattern recognition and image processing problems. 

The aforesaid features are explained in Table 1 when pj g [0.5, 1], min operator 
is used to compute group membership and k in Eq. 5 is considered to be 
10 Cr .r = l, 2 6. 

Hhy(X) has the following properties. In the absence of fuzziness when MNP b 
pixels become completely black (Pmn = 0) and MNP w pixels become completely 
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Table 1 : Higher Order Entropy 


|Case |i x 




HI 

H2 

H3 

H4 

H5 

H6 

s 

1 

{u, u,i 

,1,1, 1,1,1} 



0 

0 

0 

0 

0 

0 

1 

2 

{.5, .5, .5,. 

5, .5, .5, .5, .5 

,.5, 

.5, .5} 

1 

1 

1 

1 

1 

1 

1 

3 

{1,1, 1,1,1 

l,.5,.5,.5,.5,. 

5} 


.5 

.777 

.916 

.976 

.996 

1 

.642 

4 

{ .5, .5, .5, 

5, .5, .6, .6, .6 

-.6, 

.6} 

.980 

.991 

.996 

.999 

.999 

1 

.989 

5 

{.6, .6, .65 

,.9,.9,.9,.9,. ( 

9..S 

>,.915} 

.538 

.678 

.781 

.855 

.905 

.937 

.793 

6 

{ .8, .8, .8,. 

,8, .8, .8, .9, .9 

..9, 

•9} 

.538 

.613 

.641 

.649 

.650 

.650 

.878 

7 

{.5, .5, .5,. 

5, .5, .5, .9, .9, 

,.9, 

•9} 

.748 

.916 

.979 

.997 

1 

1 

.816 

8 

{.7, .7, .7,. 

00 

oo 

00 


.8) 

.748 

.802 

.830 

.841 

.845 

.846 

.932 


white (unm = l) then E w = P , E^= and Hj, y boils down to two state classical 
entropy 

H c = -P w log P w - Pb log Pb , (10) 


the states being black and white. Thus, H^y reduces to H c only when a proper 

defuzzification process is applied to detect (restore) the pixels. |Hhy - H c | can 

therefore be acted as an objective function for enhancement and noise reduction. The 
lower the difference, the lesser is the fuzziness associated with the individual symbol 
and higher will be the accuracy in classifying them as their original value (white or 

black). (This property was lacking with y(X) and H(X) measures (equations 2-4) 
which always reduce to zero irrespective of the defuzzification process). In other 

words, K-“c| represents an amount of information which was lost by 
transforming a two tone image to a gray tone. 

For a given P w and P^ (P w + Pj, = 1, 0 < P w , Pb < l), of all possible defuzzified 
versions, Hty is minimum for the one with properly defuzzified. 

If ji-mn =0.5 for all (m, n) then E w =E b 

and H hy = - log(0. 5 exp 0. 5) (11) 

i.e., Hjjy takes a constant value and becomes independent of P w and P fe . This is 

logical in the sense that the machine is unable to take decision on the pixels since all 

u. values are 0.5. 
r mn 

Let us consider an example of a digital image in which, say, 70% pixels look 
white, while the remaining 30% look dark. Thus the probability of a white pixel P w 

is 0.7 and that of a dark pixel P^ is 0.3. Suppose, the whiteness of the pixels is not 

constant, i.e., there is a variation (grayness) and similar is the case with the black 
pixels. 

Let us now consider the effect of improper defuzzification on the pattern shown 
in case 1 of the Table 2. Two types of defuzzifications are considered here. In cases 
2-4 all the symbols with |i = 0.5 are transformed to zero when some of them were 
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actually generated from symbol T. In cases 5-6 of Table 2 some of the p values 
greater than 0.5 which were generated from symbol 1 (or belong to the white portion 
of the image) are wrongly defuzzified and brought down towards zero (instead of 1). 

In both situations, it is to be noted that |h - H^y | does not reduce to zero. The case 
7, on the other hand, has all its elements properly defuzzified. As a result, Ej and E fl 

become 0.3 and 0.7 respectively and - H c | reduces to zero. 


Table 2: Effect of wrong defuzzification(with p Q = 0.3 and Pj = 0.7) 


C3S 6 

E 1 


H h y 

|H - H hy | 

1 

{.9,. 9, .8, .8,. 7, .6, .5,. 5, .4,. 3} 

.620 

.876 

.235 

.375 

2 

{.999,.999,.9,.8,.7,.7,.3,.3,.2,.l} 

.576 

.776 

.342 

.268 

3 

{1.1.1..99..9..9..1.. 1.0,0} 

.450 

.648 

.542 

.068 

4 

{l.l.U.1.1.0,0.0.0} 

.400 

.600 

.632 

.021 

5 

{.99,.99,.l,.l,.9,.8,.7,.2,.l,.l} 

.630 

.634 

.456 

.154 

6 

{1,1,0,0,1,1,1, 0, 0,0} 

.500 

.500 

.693 

.082 

7 

{1,1, 1,1, 1,1, 1,0, 0,0} 

.300 

.700 

.611 

0 


C(p p p^) of equation (7) has the following properties. 

a) If for higher values of p^X), p^CX) takes higher values and the converse is 
also true then C(p p P 2 ) must be very high. 

b) If with increase of x, both p } and P 2 increase then C(p j , P 2 ) > 0. 

c) If with increase of x, Pj increases and P 2 decreases or vice versa then 
C(p r p 2 ) < 0. 

d) C(p 1 ,p 1 )= 1 

e) C(pp pp s C(pp P 2 ) 

f) C(pp 1-pp = -1 

g) C(p 1 ,p 2 ) = C(p 2 ,p 1 ) 

h) -lSCOij.i^)^ 

i) C(p 1 ,p 2 ) = -C(l-p 1 ,p 2 ) 

j) C(p 1 ,p 2 ) = C(l-p 1 , l-p 2 ) 

IMAGE GEOMETRY 

The various geometrical properties of a fuzzy image subset (characterized by 
Px( x mn ) OT simply by p) as defined by Rosenfeld [5,6] and Pal and Ghosh [7] are 

given below with illustration. These provide measures of ambiguity in geometry 
(spatial domain) of an image. 
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A. Area The area of a fuzzy subset p is defined as [5] 

a(p) = Jp (12) 

where the integration is taken over a region outside which p=0. For p being 
piecewise constant (in case of digital image) the area is 

a(n) = I^ (13) 

where the summation is over a region outside which |x=0. Note from equation (13) 
that area is the weighted sum of the regions on which p has constant value weighted 
by these values. 

Example 1 Let p be of the form 
0.2 0.4 0.3 
0.2 0.7 0.6 
0.6 0.5 0.6 

Area a(p) = (0.2+0.4+0.3+0.2+0.7+0.6+0.6+0.5+0.6) — 4.1 

B. Perimeter If |x is piecewise constant, the perimeter of p is defined as [5] 

P(lt)= I |p(i)-H(j)|*|A(i, j, k)| 

i, j, k (14) 

This is just the weighted sum of the lengths of the arcs A(i, j, k) along which the 
regions having constant p values p(i) and p(j) meet, weighted by the absolute 
difference of these values. In case of an image if we consider the pixels as the 
piecewise constant regions, and the common arc length for adjacent pixels as unity 
then the perimeter of an image is defined by 

p(p)= I |n(i)-H(j)| 

i, j (lb) 

where p(i) and p(j) are the membership values of two adjacent pixels. 

For the fuzzy subset p of example 1, perimeter is 

p(p) = |0. 2 - 0. 4| + 10. 2 - 0. 2| + 10. 4 - 0. 3| + 10. 4 - 0. 7| 

+10.3 - 0.61 + 10.2 - 0.61 + 10.2 - 0.71 + 10.7 - 0.61 
+|0.7-0.5| + |0.6-0.6| + |0.6-0.5|+|0.5-0.6| 

=2.3 


C. Compactness The compactness of a fuzzy set p having an area of a (p) and a 
perimeter of p(p) is defined as [5] 

t \ a(p) 
comp(p) = ——J 

(p(p)) (16) 

Physically, compactness means the fraction of maximum area (that can be encircled 
by the perimeter) actually occupied by the object. In non fuzzy case the value of 
compactness is maximum for a circle and is equal to n / 4 In case of fuzzy disc, 
where the membership value is only dependent on its distance from the center, this 
compactness value is > n / 4 [6]. Of all possible fuzzy discs compactness is 
therefore minimum for its crisp version. 

For the fuzzy subset p of example 1, comp(p) = 4.I/(2.3*2.3) = 0.775. 
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D. Height and Width The height of a fuzzy set p is defined as [5] 
hOi^lmaxiimndn 

m [17] 

where the integration is taken over a region outside which ji. = 0. 

Similarly the width of the fuzzy set is defined by 
w(n) = Jmaxit mn dm 

with the same condition over integration as above. For digital pictures m and n can 
take only discrete values, and since p = 0 outside the bounded region, the max 
operators are taken over a finite set. In this case the definitions take the form 

h(p) = Imaxp mn 
n m 


and 


w(|x) = Imaxp iral 
m n 


( 20 ) 


m = 1, 2, . . . M; n = 1, 2, . . . N 

So physically, in case of a digital picture, height is the sum of the maximum 
membership values of each row. Similarly, by width we mean the sum of the 
maximum membership values of each column. 

For the fuzzy subset p of example 1, height is h(p) = 0.4+0.7+0.6 = 1.7 and 
width is w(p) = 0.6+0.7+0.6 = 1.9. 


E. Length and Breadth The length of a fuzzy set p is defined as [7] 

l(p) = max (Jpmndn) (2] 

m v 

where the integration is taken over the region outside which p^ = 0. In case of a 

digital picture where m and n can take only discrete values the expression takes the 
form 


l(p) = max 


'v 

£Pmn 
\n J 


( 22 ) 


Physically speaking, the length of an image fuzzy subset gives its longest expansion 
in the column direction. If p is crisp, p^ = 0 or 1; in this case length is the 

maximum number of pixels in a column. Comparing equation (22) with (19) we 
notice that the length is different from height in the sense, the former takes the 
summation of the entries in a column first and then maximizes over different 
columns whereas, the later maximizes the entries in a column and then sums over 
different columns. 

The breadth of a fuzzy set p is defined as 


b(p) = max (IPmndm) (2y 

n v ' 

where the integration is taken over the region outside which p mn = 0. In case of a 

digital picture the expression takes the form 
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b(n) = max 
n 

Physically speaking, the breadth of an image fuzzy subset gives its longest 
expansion in the row direction. If (i is crisp, p. = 0 or 1; in this case breadth is 

the maximum number of pixels in a row. The difference between width and breadth 
is same as that between height and length. 

For the fuzzy subset |i in example 1, length is l(p) = 0.4 + 0.7 + 0.5 = 1.6 and 
breadth is b(p) = 0.6 + 0.5 + 0.6 = 1.7. 

F. Index of Area Coverage (IOAC) The index of area coverage of a fuzzy set may be 
defined as [7] 


^M-mn 

m J 


(24) 


IOAC(p) = 


area(p) 

l(R)*b(p) 


(25) 


In nonfuzzy case, the IOAC has value of 1 for a rectangle (placed along the axes of 


2 

measurement). For a circle this value is Jtr / (2r * 2r) = 7t / 4 . Physically by 
IOAC of a fuzzy image we mean the fraction (which may be improper also) of the 
maximum area (that can be covered by the length and breadth of the image) actually 
covered by the image. 

For the fuzzy subset ji of example 1, the maximum area that can be covered by 
its length and breadth is 1.6* 1.7 = 2.72 whereas, the actual area is 4.1, so the IOAC 
= 4.1 / 2.72= 1.51. 

It is to be noted that 1 (X)/h (X) < 1 (26) 

b(X)/w(X)<l (27) 

When equality holds for (26) or (27) the object is either vertically or horizontally 
oriented. 

G. Degree of Adjacency The degree to which two regions S and T of an image are 
adjacent is defined as 


^ S,T ^ peBP(S) 1 + Kp)" r (q)| l + d(p) (28) 

Here d(p) is the shortest distance between p and q, q is a border pixel (BP) of T and p 
is a border pixel of S. The other symbols are having their same meaning as in the 
previous discussion. 

The degree of adjacency of two regions is maximum (=1) only when they are 
physically adjacent i.e., d(p)=0 and their membership values are also equal i.e., |J.(p) = 
r(q). If two regions are physically adjacent then their degree of adjacency is 
determined only by the difference of their membership values. Similarly, if the 
membership values of two regions are equal their degree of adjacency is determined by 
their physical distance only. 


IMAGE PROCESSING OPERATIONS 


In this section we will be explaining how the various grayness and geometrical 
ambiguity measures can be used for image enhancement, segmentation, edge 
detection and skeleton extraction problems. The algorithms which will be described 
here provide both fuzzy and nonfuzzy (as a special case) outputs. 



158 


Segmentation and Object Extraction 

The problem of grey level thresholding plays an important role in image 
processing. For example, in enhancing contrast in a image we need to select proper 
threshold levels from its histogram so that some suitable non-linear transformation 
can highlight a desirable set of pixel intensities compared to others. Similarly, in 
image segmentation one needs proper histogram thresholding whose objective is to 
establish boundaries in order to partition the image spaces into meaningful regions. 
This Section illustrates an application of theory of fuzzy sets to make this task 
automatic so that an optimum threshold (or set of thresholds) may be estimated 
without the need to refer directly to the histogram. 

Criteria for Threshold Selection 

Let us consider, first of all, the parameters Y (X) or H(X) to explain the criterion of 
thresholding. 

Consider the standard S-function [8] 

p mn = M x mn) = S^; a, b, c) = 0, x^ < a (29a) 

= 2[( x mn — a)/(c — a)] ,a< Xj^ b (29b) 

= l-2[(x mn -c)/(c-a)] ,b < Xj^f, < c (29c) 
= 1, Xmn^c (29d) 

with b = (a+c )/ 2, b-a = c-b = A b, 

for obtaining the plane from the spatial x^ plane of the image X and for 

computing Y(X) and H(X) values from Eqs. (2), (3) and (4). The parameter b is the 
cross-over point, i.e., S(b; a, b ,c) = 0.5. A b is the bandwidth. This is explained in 
Fig. 2 for an L-level image. Such a p plane may be viewed to represent a fuzzy set 
"bright image" so that the degree of brightness of a pixel increases with its gray 
value. 



0 a b c L-l 

* mn 


Figure 2 Standard S function for an L-level image. 

For a particular cross-over point, say, b = 1 £ we have P X 0 C ) = 0.5 and the p mn 

plane would contain values > 0.5 or < 0.5 corresponding to x mn > l c or < l c . The 
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terms y(X) and H(X) then measure the average ambiguity in X by computing 
P-Xox( x mn) or Sn(p x (x mn )) which is 0 if |^x( x mn ) = 0 or 1 and is maximum for 

M-x( x mn) = 

The selection of a cross-over point at b = 1 implies the allocation of grey levels 

< 1 and > 1 within the two clusters namely, background and object of a binodal 
c c 

image. The contribution of the levels towards y(X) and H(X) is mostly from those 

around 1 and would decrease as we move away from 1 . Again, since the nearest 
c c 

ordinary plane X (which gives the two-tone version of X) is dependent on the 

position of cross-over point, a proper selection of b may therefore be obtained which 

will result in appropriate segmentation of object and background. In other words, if 

the grey level of image X has binodal distribution, then the above criteria for different 

values of b would result in a minimum Y or H value only when b corresponds to the 

appropriate boundary between the two clusters. 

For such a position of the threshold (cross-over point), there will be minimum 

number of pixel intensities in X having p^ — 0.5 (resulting in Y or H — 1) and 

maximum number of pixel intensities having p^ — 0 or 1 (resulting in Y or H — 

0) thus contributing least towards y(X) or H(X). This optimum (minimum) value 
would be greater for any other selection of the cross-over point. 

This suggests that modification of the cross-over point will result in variation of 

the parameters y(X) and H(X) and so an optimum threshold may be estimated for 

automatic histogram-thresholding problems without the need to refer directly to the 
histogram of X. The above concept can also be extended to an image having 
multimodal distribution in grey levels in which one would have several minima in Y 
and H values corresponding to different threshold points in the histogram. 

Let us now consider the geometrical parameters comp(X) and IOAC(X) 

(equations 16 and 25). It has been noticed that for crisp sets the value of index of 
area coverage (10 AC) is maximum for a rectangle. Again, of all possible fuzzy 
rectangles IOAC is minimum for its crisp version. Similarly, in a nonfuzzy case the 
compactness is maximum for a circle and of all possible fuzzy discs compactness is 
minimum for its crisp version [6]. For this reason, we will use minimization (rather 
than maximization) of fuzzy compactness/IOAC as a criterion for image 
segmentation [9]. 

Suppose we use equation (29) for obtaining the ’bright image’ p(X) of an image 
X. Then for a particular cross over point of S function, compactness (p) and 
IOAC(p) reflect the average amount of ambiguity in the geometry (i.e., in spatial 
domain) of X. Therefore, modification of the cross over point will result in different 
p(X) planes (and hence different segmented versions), with varying amount of 
compactness or IOAC denoting fuzziness in the spatial domain. The p(X) plane 
having minimum IOAC or compactness value can be regarded as an optimum fuzzy 
segmented version of X. 

For obtaining the nonfuzzy threshold one may take the cross over point (which 
is considered to be the maximum ambiguous level) as the threshold between object 
and background. For images having multiple regions, one would have a set of such 
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optimum |i(X) planes. The algorithm developed using these criteria is given below. 
Algorithm 1 

Given an L level image X of dimension MxN with minimum and maximum 

gray vales ^min and ^ max respectively. 

Step 1: Construct the membership plane using equation (29) as 

Mum = M(^) = S(/» 3, b, c) 

(called bright image plane if the object regions possess higher gray values) 

Mmn = M(*) = 1 - S(/; a, b, c) 

(called dark image plane if the object regions possess lower gray values) 
with cross-over point b and a bandwidth A b. 

Step 2: Compute Y(X),H(X), Comp(X) and IOAC(X) 

Step 3: Vary b between ^min and ^ ma x and select those b for which I(X) 
(where I(X)) denotes one of the aforesaid measures or a combination of them) has 
local minima. Among the local minima let the global one have a cross over point s. 

The level s, therefore, denotes the cross over point of the fuzzy image plane 
(i , which has minimum grayness and/or geometrical ambiguity. The plane 

then can be viewed as a fuzzy segmented version of the image X. For the purpose of 
nonfuzzy segmentation, we can take s as the threshold or boundary for classifying or 
segmenting image into object and background. 

The measure I(X) in Step 3 can represent either grayness ambiguity (i.e., Y(X) 
or H(X)) or geometrical ambiguity (i.e., comp(X) or IOAC(X) or a(S,T)) or both 
(i.e., product of grayness and geometrical ambiguities). 

Faster Method of Computation 

From the algorithm 1 it appears that one needs to scan an L level image L times 
(corresponding to L cross over points of the membership function) for computing the 
parameters for detecting its threshold. The time of computation can be reduced 
significantly by scanning it only once for computing its co-occurrence matrix, row 
histogram and column histogram, and by computing |i.(l), 1 = 1, 2, ... L every time 
with the membership function of a particular cross over point 

The computations of y(X) (or H(X)), a(X), p(X), 1(X) and b(X) can be made 
faster in the following way. Let h(i), i=l,2..L be the number of occurrences of the 
level i, c[ij], i = 1, 2 . . L, j = 1, 2 . . L the co-occurrence matrix and |x(i), i = 1, 2.. 
L the membership vector for a fixed cross over point of an L level image X. 

Determine Y(X) , area and perimeter as 

Y(X) = -^- XT(i)h(i) 

MN l= i 

T(i)= min{p(i),l-p(i)} 


(30a) 

(30b) 
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a(X) = 


I h(i) .|i(i) 


i=l 


(31) 


p(X)=I IC[i,j], |p(i)-p(j)| 

i=l j=l (3-i) 

For calculating length and breadth following steps can be used. Compute the 
row histogram R[m, 1], m = 1, . . . M, 1 = 1 . . L, where R[m, 1] represents the 
number of occurrences of the gray level 1 in the mth row of the image. Find the 
column histogram C[n, 1], n = 1. . N, 1 = 1 . . L, where C[n, 1] represents the 
number of occurrences of the gray level 1 in the nth column of the image. Calculate 
length and breadth as 


L 

1(X) = max Z C[n, 1]. p(l) 
n i=i 


(33) 


L 

b(X) = max £ R[m, 1]. p.(l) 
m 1=1 


(34) 


Some Remarks 

The grayness ambiguity measure e.g., y(X) or H(X) basically sharpens the 
histogram of X using its global information only and it detects a single threshold in 
its valley region. Therefore, if the histogram does not have a valley, the above 
measures will not be able to select a threshold for partitioning the histogram. This 

can readily be seen from Equation (30) which shows that the minima of y(X) 
measure will only correspond to those regions of gray level which has minimum 
occurrences (i.e., valley region). Comp (X) or IOAC(X), on the other hand, uses 
local information to determine the fuzziness in spatial domain of an image. As a 
result, these are expected to result better segmentation by detecting thresholds even in 
the absence of a valley in the histogram. 

Again, comp(X) measure attempts to make a circular approximation of the 
object region for its extraction, whereas, the IOAC(X) goes by the rectangular 
approximation. Their suitability to an image should therefore be guided by this 
criterion. 

Choice of Membership Function 

In the aforesaid algorithm w = 2Ab is the length of the interval which is shifted 
over the entire dynamic range of gray scale. As w decreases, the p.(x ) plane would 

have more intensified contrast around the cross-over point resulting in decrease of 
ambiguity in X. Asa result, the possibility of detecting some undesirable thresholds 
(spurious minima) increases because of the smaller value of A b. On the other hand, 
increase of w results in a higher value of fuzziness and thus leads towards the 
possibility of losing some of the weak minima. 

The criteria regarding the selection of membership function and the length of 
window (i.e., w) have been reported recently by Murthy and Pal [10] assuming 
continuous function for both histogram and membership function. For a fuzzy set 
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"bright image plane", the membership function |x: [0, w] -» [0,1] should be such 
that 

i) p is continuous, |i(0) = 0, p(w) = 1 

ii) p is montominally non-decreasing, and 

iii) p(x) = 1- p(w-x) for all x 6 [0, w] where w>0 is the length of the window. 

Furthermore, p should satisfy the bound criteria derived based on the correlation 
measure (equation 7). The main properties on which correlation was formulated are 
Pj : If for higher values of p r p^ takes higher values and for lower values 

of Pj, p 2 also takes lower values then CQJ^, p 2 ) > 0 
P 2 : If p 1 1 and p^ then C(p lt p 2 ) > 0 

P 3 : If Pj t and P 2 ^ then C(p r p 2 ) < 0 

[T denotes increases and ^ denotes decreases]. 

It is to be mentioned that P2 and P3 should not be considered in isolation of P j 
Had this been the case, one can cite several examples when Pj T and p^ but C(Pj , 
p 2 ) < 0 and Pj t and p^ but C(p r p 2 ) > 0. Subsequently, the type of 

membership functions which should not be considered in fuzzy set theory are 
categorized with the help of correlation. Bound functions hj and h 2 are accordingly 

derived [11]. They are 

hi (x) = 0, 0 < x <e (35) 

= x- €, e< x < 1 

h 2 (x) = x+e, 0 < x < 1- e (36) 

= 1, 1- e< x < 1 

where e = 0.25. The bounds for membership function p are such that 
hj(x)<p(x)< h 2 (x) forxe[0,l]« 

For x belonging to any arbitrary interval, the bound functions will be changed 

proportionately. For hj <p^h 2 , C(hi,h 2 )^0 ,C(hi,p)s0 and C(h 2 ,p) SO. 

The function p lying in between h j and h 2 does not have most of its variation 

concentrated (i) in a very small interval, (ii) towards one of the end points of the 
interval under consideration and (iii) towards both the end points of the interval under 
consideration. 

Figure 3 shows such bound functions. It is to be noted that Zadeh's standard S 
function (equation 29) satisfies these bounds. 

It has been shown [10] that for detecting a minimum in the valley region of a 
histogram, the window length w of the p function should be less than the distance 
between two peaks around that valley region. 
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Figure 3 Bound Functions for p(x). 


If as an Objective Criterion 

Let us now explain another way of extracting object by minimizing higher order 
fuzzy entropy (equation 5) of both object and background regions. Before explaining 
the algorithm, let us describe the membership function and its selection procedure. 

Let s be an assumed threshold which partitions the image X into two parts 
namely, object and background. Suppose the gray level ranges [1 - s] and [s + 1 - L] 
denote, respectively, the object and background of the image X. An inverse Jt-type 
function as shown by the solid line in the Figure 4 is used here to obtain |X mn values 

of X. The inverse rc-type function is seen (from Fig. 4) to be generated by taking 
union of S(x ; (s - (L - s)), s, L) and 1 - S(x; 1, s, (s + s - 1)), where S denotes the 
standard S function defined by Zadeh (equation 29). 

The resulting function as shown by the solid line, makes (J. lie in [0.5,1]. Since 
the ambiguity (difficulty) in deciding a level as a member of the object or the 
background is maximum for the boundary level S, it has been assigned a membership 
value of 0.5 (i.e., cross-over point). Ambiguity decreases (i.e., degree of 
belongingness to either object or background increases) as the gray value moves away 
from s on either side. The p mn thus obtained denotes the degree of belongingness of 

a pixel x to either object or background. 

Since s is not necessarily the mid point of the entire gray scale, the membership 
function (solid line if Fig. 4) may not be a symmmetric one. It is further to be noted 
that one may use any linear or nonlinear equation (instead of Zadeh's standard S 
function) to represent the membership function in Fig. 4. Unlike the Algorithm- 1, 
the membership function does not need any parameter selection to control the output. 

Algorithm 2 

Assume a threshold s, 1 < s < L and execute the following steps. 

Step 1: Apply an inverse n - type function [Fig. 4] to get the fuzzy p, plane, 
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Figure 4 Inverse n function (solid line) for computing object and background 

entropy. 

with p mn e [0.5, 1]. (The membership function is in general asymmetric). 

Step 2: Compute the rth order fuzzy entropy of the object Hq and the background 

Hg considering only the spatially adjacent sequences of pixels present within the 

object and background respectively. Use the 'min' operator to get the membership 
value of a sequence of pixels. 

Step 3: Compute the total rth order fuzzy entropy of the partitioned image as 
Hs = Hq + Hg . 

Step 4: Minimize H* with respect to s to get the threshold for object background 
classification. 

2 

Referring back to the Table 1 , we have seen that H reflects the homogeneity 

among the supports in a set, in a better way than H* does. Higher the value of r, the 
stronger is the validity of this fact. Thus, considering the problem of object- 

background classification, H r seems to be more sensitive (as r increases) to the 
selection of appropriate threshold; i.e., the improper selection of the threshold is 

r r-1 2 

more strongly reflected by H than H For example, the thresholds obtained by H 

measure has more validity than those by (which only takes into account the 
histogram information). Similar arguments hold good for even higher order (r > 2) 
entropy. 

Example 2 

Figures 5 and 6 show the images of Lincoln and blurred chromosome along with 
the histogram. Table 3 shows the thresholds obtained by comp (X) and IOAC (X) 
measures for various window sizes w when Zadeh's S function is used as membership 
function. Lincoln image is of 64x64 with 32 gray levels whereas, chromosome 
image is of 64x64 with 64 gray levels. 







Figure 7(a) Threshold = 10. 


Figure 7(c) Threshold = 56. 


Figure 7(b) Threshold = 32. 


Table 3 Various Thresholds (* denotes 


Threshold produced by n measure (Algorithm 2) is 8 for Lincoln image. Some 
typical nonfuzzy thresholded outputs of these images are shown in Figure 7. 
Recently, transitional correlation and within class correlation have been defined [12] 
based on equation (7) for image segmentation which takes both local and global 
information into account. 


w 1 

Lincoln 

Comp 

IOAC 


n 

10 

11*23 

m 

10 
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12 
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16 

55 
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20 

54 
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Image Enhancement 

The object of enhancement technique is to process a given image so that the 
result is more suitable than the original for a specific application. The term 'specific' 
is of course, problem oriented. The techniques used here are based on the 
modification of pixels in the fuzzy property domain of an image. Three kinds of 
enhancement operations namely, contrast enhancement, smoothing and edge detection 
will be discussed here. 

Enhancement in Property Domain [13-16] 

The contrast intensification operator on a fuzzy set A generates another fuzzy set 
A' = INT (A) in which the fuzziness is reduced by increasing the values of p A (x) 

which are above 0.5 and decreasing those which are below it. Define this INT 
operator by a transformation Tj of the membership function |i ^ or as 

3) (fmn ) ~ ^1 (^rnn ) = 2P mn > O^PjdjjSO. 5 (37a) 

= Tr(P mn )=l-2(l-P mn ) 2 , 0. 5 < P^ < 1 (37b) 

m = 1, 2, . . . M, n = 1, 2, . . . N 

In general, each P mn or in X (Eq. 1) may be modified to P' mn to enhance 
the image X in the property domain by a transformation function T f where 

Pjnn = (frnn ) = (^mn )> 0 - Pmn - 0. 5 (38a) 

= Tf(P mn )= O^P^Sl (38b) 

r= 1,2, ... 

The transformation function T is defined as successive applications of Tj by the 
recursive relationship 

Ts(Pmn) = Tl {3s-l(Pmn)}* S = l, 2, . . . (39) 

and Tj.^Pjm, ) represents the operator INT defined in (37). 

This is shown graphically in Figure 8. As r increases, the curve tends to be 
steeper because of the successive application of INT. In the limiting case, as 

r oo, T r produces a two-level (binary) image. It is to be noted here that, 

corresponding to a particular operation of T', one can use any of the multiple 
operations of T", and vice versa, to attain a desired amount of enhancement. Now it 
is up to the user how he will interpret and exploit this flexibility depending on the 

problems to hand. It is further to be noted from equation 8(c) that H(X) or Y(X) of 
an image decreases with its contrast enhancement [16]. 

Die membership plane (i^ for enhancing contrast around a cross-over point 

may be obtained from [13] 



168 


'■mn 


Ifaf' 


-1 


(40) 


M-mn = *^( x mn )~^ + (| x — 
where the position of cross-over points, bandwidth and hence the symmetry of the 
curves are determined by the fuzzifiers F e and Fj. When x = x max (maximum level 


in X), represents an S type function. When x = any arbitrary level l , p mn 

represents a n type function. Zadeh’s standard functions do not have the provision 
for controlling its cross-over point. The parameters F g and F^ of equation (40) are 

determined from the cross-over point across which contrast enhancement is desired. 
After enhancement in the fuzzy property domain, the enhanced spatial domain 

x mn may be obtained from 


x mn “ G (M-mn )’ a - Finn - 1 

where a is the value of p when x =0. 

r mn mn 


(41) 



0 0.5 1 

M-mn 


Figure 8 INT Transformation function for contract enhancement in property plane. 


Smoothing Algorithm 

Hie idea of smoothing is based on the property that image points which are 
spatially close to each other tend to possess nearly equal grey levels. Smoothing of 
an image X may be obtained by using q successive applications of ’min’ and then 
'max’ operators within neighbors such that the smoothed grey level value of (m, n)th 
pixel is [13,14] 

x^ = max q min q {xjjj, 

Ql Qi 1 JJ 

(i, j)*(m, n),(i, j)eQi, q=l, 2. . . 

Smoothing operation blurs the image by attenuating the high spatial frequency 


( 42 ) 
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components associated with edges and other abrupt changes in grey levels. The 
higher the values of Qj and q, the greater is the degree of blurring. 

Edge Detection 


If denotes the edge intensity corresponding to a pixel x m then edges of the 
image are defined as [13,15] 



Edges A [J U x mn 
m n 

(43a) 

where 

x mn = ^mn-ndnjxij} 1 

(43b) 

or, 

x mn = ^ x mn — n ^ x { x ij}^ 

(43c) 

or, 

x [nn = ma x { x ij}-mui{ x ij}. 0J)eQ 

(43d) 


Q is a set of N coordinates (i, j) which are on/within a circle of radius R centered at 
the point (m,n). Equation (43 c) as compared with (43b) causes the boundary to be 
expanded by one pixel. Equation (43 d), on the other hand, results in a boundary of 
two-pixel width. It therefor appears from Eq. (43) that the better the contrast 
enhancement between the regions, the easier is the detection and the higher is the 

intensity of contours among them. 

Other operations based on max and min operators are available in [17,18]. 
Automatic selection of an appropriate enhancement operator based on fuzzy geometry 
is available in [19]. 

Edginess Measure 

Let us now describe an edginess measure [20,21] based on ^(Equation 5) which 
denotes an amount of difficulty is deciding whether a pixel can be called an edge or 

not. Let N 3 be a 3 x 3 neighborhood of a pixel at (x, y) such that 
x,y 

N x,y = ((x, y). (x-l,y), (x+l,y), (x,y-l), (x,y+l), (x-1, y-1), 

(x-1, y+1), (x+l,y-l), (x+l,y+l)} (44) 

The edge-entropy, H E of the pixel (x, y), giving a measure of edginess at (x, 
x.y 

y) may be computed as follows. For every pixel (x, y), compute the average, 

maximum and minimum values of gray levels over N 3 . Let us denote the average, 

*,y 

maximum and minimum values by Avg, Max, Min respectively. Now define the 
following parameters. 
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D = max { Max - Avg, Avg - Min} (45) 

B = Avg (46) 

A = B -D (47) 

C = B + D (48) 


A 7t-type membership function is then used to compute p for all (x, y) 

xy 

o 

e N x ,y , such that p(A) = p(C) = 0.5 and p(B) =1. It is to be noted that p S 0.5. 

xy 

Such a |i , therefore, gives the degree to which a gray level is close to the average 
xy 

2 

value computed over N Xt y • h* other words, it represents a fuzzy set "pixel intensity 

3 

close to its average value", averaged over N . When all pixel values over n 3 

•ij x,y 

are either equal or close to each other (i.e., they are within the same region), such a 

transformation will make all p = 1 or close to 1. In other words, if there is no 

xy 

edge, pixel values will be close to each other and the p values will be close to 
one(l); thus resulting in a low value of H*. On the other hand, if there is an edge 

(dissimilarity in gray values over N x ,y)« then the p values will be more away from 
unity; thus resulting in a high value of H. Therefore, the entropy H 1 over N x>y 
can be viewed as a measure of edginess (^x,y ) at the point (x, y). The higher the 

value of H x ,y . the stronger is the edge intensity and the easier is its detection. As 

mentioned before, there are several ways in which one can define a Jt-type function as 
shown in Fig. 9. 



ABC 

Figure 9 7t function for computing edge entropy. 


The proposed entropic measure is less sensitive to noise because of the use of a 
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Fuzzy Skeletonization 


The problem of skeletonization or thinning plays a key role in image analysis 
and recognition because of the simplicity of object representation it allows. Let us 
now explain a skeletonization technique [23] based on minimization of compactness 
property over the fuzzy core line plane. The output is fuzzy and one may obtain its 
nonfuzzy (crisp) single pixel width version by retaining only those pixels which have 
strong skeleton-membership value compared to their neighbors. 

Coreline Membership Plane 


After obtaining a fuzzy segmented version (as described before) of the input 
image X, the membership function of a pixel denoting the degree of its belonging to 
the subset 'Core line' (skeleton) is determined by three factors. These include the 
properties of possessing maximum intensity, and occupying vertically and 
horizontally middle positions from the edges (pixels beyond which the membership 
value in the fuzzy segmented image is zero) of the object 

Let x max be the maximum pixel intensity in the image and P Q ( x mn ) be the 

function which assigns the degree of possessing maximum brightness to the (m,n)th 
pixel. Then the simplest way to define P 0 ( x mn ) is 


P (x ) = x /x 
o' mir mn max 


(49) 


It is to be mentioned here that one may use other monotonically nondecreasing 
functions to define P (x) with a flexibility of varying cross-over point. Equation 49 

is the simplest one with fixed cross-over point at x max / 2. 

Let Xj and be the distances of x mn from the left and right edges respectively. 

(The distance being measured by the number of units separating the pixel under 
consideration from the first background pixel along that direction). Then Pjj( x mn ) 

denoting the degree of occupying the horizontally central position in the object is 
defined as 


^i( x mn) = — ifd(x 1 ,x 2 )<landx 1 <x 2 , 
x 2 


= * 2 . 
x i 


ifd(xj,x 2 )<l andxj >x 2 . 


x 2( x l+ x 2 ) tfd ( x l* x 2) 311(1 X 1 <x 2> 

^ X 2 if A / v . n. 1 onH v. ^ v . 


x l( x l + x 2 ) 

where d( x i, x 2) = | x l ~ x 2| 


ifd(xj,x 2 )>l andxi >x 2 , 


( 50 ) 
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Similarly, the vertical function is defined as 


p f x \ = 1L 
r vV A mn; „ 

yi 


ifd(yi,y2) 51 a n dyi<y 2 . 


= — ifd(yi,y2)<landyi>y2, 

y\ 

= — — r if d(yi,y 2 )>l andyi 

y2(yi+y2j 

Equations (50,51) assign high values (- 1 - 0 ) for pixels near core and low values to 
pixels away from the core. The factor (Xj + x 2 ) or (yj + y 2 ) in the denominator 

takes into consideration the extent of the object segment so that there is an 
appreciable amount of changes in the property value for the pixels not belonging to 
the core. 

These primary membership functions P , P h and P y may be combined as either 
^c(x mn ) = max{min(P 0 ,Ph),min(P 0 ,P v ),min(P h) P v )} (52) 

or lie ( x mn ) = WiP 0 + W 2 P h + W 3 P V (53a) 


<Y2> 

>y2. 


(51) 


with wl + w2 + w3 = 1 (53b) 

to define the grade of belonging of x^ to the subset 'Core Line' of the image. 

Equation (52) involves connective properties using max and min operators such 
that |i c = 1 when at least two of the three primary properties take values of unity. 

All the three primary membership values are given equal weight in computing the 1 1 

value. Equation (53), on the other hand, involves a weighted sum (weights being 
denoted by W j , Wj and W^). Usually, one can consider the weight attributed to 

P Q (property corresponding to pixel intensity) to be higher than the other two and 
W 2 = W 3 

Equation (52) or (53) therefore extracts (using both gray level and spatial 
information) the subset 'Core line' such that the membership value decreases as one 
moves away towards the edges (boundary) of object regions. 


Optimum a-cut 

Given the l- l c ( x rrLn ) plane developed in the previous stage with the pixels having 

been assigned values indicating their degree of membership to 'Core line', the 
optimum (in the sense of minimizing ambiguity in geometry or in spatial domain) 
skeleton can be extracted from one of its a-cuts having minimum comp(p) value 
(Eq. (16)). The a-cut of p c (x ^ ) is defined as 
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MC a = { x mn e Xj llc(^nm) * a, 0 < a < l} (54) 

Modification of a will therefore result in different fuzzy skeleton planes with varying 
comp(p) value. As a increases, the comp(p) value initially decreases to a certain 
minimum and then for a further increase in a, the compel) measure increases. 

The initial decrease in comp(p) value can be explained by observing that for 
every value of a, the border pixels having p- values less than a are not taken into 
consideration. So, both area (Eq. (13)) and perimeter (Eq. (14)) are less than those for 
the previous value of a. But the decrease in area is more than the decrease in its 
perimeter and hence the compactness (Eq. 16) decreases (initially) to a certain 

minimum corresponding to a value a = a', say. 

Further increase in a (i,e,. for a > a'), results in a M€ a consisting of a 

number of disconnected regions (because majority of core line pixels being dropped 
out). As a result, decrease in perimeter here is more than the decrease in area and 

comp(p) increases. The PCa' plane having minimum compactness value can be 
taken as an optimum fuzzy skeleton version of the image X. This is optimum in the 
sense that for any other selection of a (i,e,. a ^ a' ) the comp(p) value would be 
greater. 

If a nonfuzzy (crisp) single-pixel width skeleton is deserved, it can be obtained 
by a contour tracking algorithm [24] which takes into account the direction of 
contour, multiple crossing pixels, lost path due to spurious wiggles etc. based on 
octal chain code. 

Fig. 11 shows the optimum fuzzy skeleton of biplane image (Fig. 10). This 
corresponds to a = 0.55. The connectivity of the skeleton in the optimum version 
can be preserved, if necessary, by inserting pixels having intensity equal to the 
minimum of those of pairs of neighbors in the object. 



Figure 1 1 Optimum fuzzy 
skeleton of 
biplane. 
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PRIMITIVE EXTRACTION 

In picture recognition and scene analysis problems, the structural information is 
very abundant and important, and the recognition process includes not only the 
capability to assign the input pattern to a pattern class, but also the capacity to 
describe the characteristics of the pattern that make it ineligible for assignment to 
another class. In these cases the recognition requirement can only be satisfied by a 
description of pattem-rather than by classification. 

In such cases complex patterns are described as hierarchical or tree-like structures 
of simpler subpattems and each simpler subpattem is again described in terms of even 
simpler subpattems and so on. Evidently, for this approach to be advantageous, the 
simplest subpattems, called pattern primitives are to be selected. 

Another activity which needs attention in this connection is the subject of shape- 
analysis that has become an important subject in its own right. Shape analysis is of 
primal importance in feature/primitive selection and extraction problems. Shape 
analysis also has two approaches, namely, description of shape in terms of scalar 
measurements and through structural descriptions. In this connection, it needs to be 
mentioned that shape description algorithms should be information-preserving in the 
sense that is is possible to reconstruct the shapes with some reasonable 
approximation from the descriptors. 

This section presents a method [24] to demonstrate an application of the theory 
of fuzzy sets in automatic description and primitive extraction of gray-tone edge- 
detected images. The ultimate aim is to recognize the pattern using syntactic 
approach as described in the next section. 

The method described here provides a natural way of viewing the primitives in 
terms of arcs with varying grades of membership from 0 to 1. 

Encoding 

The gray tone contour of an image can be encoded into one-dimensional symbol 
strings using the rectangular (octal) array method. The directions of the octal codes 
are shown in Figure 12. An octal code is used to describe a w-pixel (w>l) length 
contour by taking the maximum of its grades of membership corresponding to 
'vertical', 'horizontal' and 'oblique' lines. This approximation of using w-pixel 
(instead of one-pixel) length line saves computational time and storage requirement 
without affecting the system performance. 

Py(x), Pjj(x) and p^x) representing the membership functions for vertical, 

horizontal and oblique lines respectively of a line segment x marking an angle 0 
with the horizontal line H (Figure 13) may be defined as [13,24]. 

p v (x) = l-|l/m x | Fe ,| m x | >1 , 

= 0 otherwise 

Fh( x ) = 1 | m x | Fe ’ | m x I < 1» 

= 0 otherwise 


(55) 

(56) 
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7 8 1 



Figure 12 The directions of octal codes. Figure 13 Membership function for 

vertical and horizontal lines. 

Hob 00 “ 1 " 1(6 - 45)/45| F ‘ ,0<|m x |<~, (57) 

= 0 otherwise 

F is a positive constant which controls the fuzziness in a set and m x = tan 0 . 
The equations (55-57) are such that 



H v (x) -» 1 

as 101 -» 90°, 


H h (x) -> 1 

as 101 -> 0 ° , 


H ob ( x > -* 1 

as 101 -» 45°, 

and 

H v (x) £ 1 

as I0l£45 ° , 


The details of encoding technique are available in [13]. 

Segmentation and Contour Description 

The next task before extraction of primitives and description of contours is the 
process of segmentation of the octal coded strings. Splitting up of a chain is 
dependent on the constant increase/decrease in code values. For extracting an arc, the 
string is segmented at a position whenever a decrease/increase after constant 
increase/decrease in values of codes is found [13]. Again, if the number of codes 
between two successive changes exceeds a prespecified limit, a straight line is said to 
exist between two curves. In the case of a closed curve, a provision may be kept for 
increasing the length of the chain by adding first two starting codes to the tail of the 
string. This enables one to take the continuity of the chain into account in order to 
reflect its proper segmentation [13]. 

After segmentation one needs to provide a measure of curvature along with 
direction of the different arcs and also to measure the length of lines in order to 
extract the primitives. The degree of 'arcness' of a line segment x is obtained using 
the function 
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a ►- 

Figure 14 Membership function for arc. Figure 15 Nuclear Pattern of brain cell. 

Harc« = (l-a/l) Fe . (58) 

a is the length of the line joining the two extreme points of an arc x (Figure 14), i 
is the arc-length such that the lower the ratio a It is, the higher is the degree of 
'arcness'. 

For example, consider a sequence of codes 5 6 67 denoting an arc x. For 
computing its t note that if a code represents an oblique line, the corresponding 

increase in arc-length would be -J2 , otherwise increase is by unity. Arc diameter a is 
computed by measuring the resulting shifts Am and An of spatial coordinates (along 
mth and nth axes) due to those codes in question. For the aforesaid example we have 
Am = 1+0 + 0 + -1 = 0, 

An = -1-1- 1-1 = -4, 

a = VAm 2 + An 2 = 4, 
l = 4.828, 

ji arc (x) = 0.643 (for F e =0.25). 

Since the initial code (5) is lower than the final code (7), the sense of the curve is 
positive (clockwise). 

Similarly, for sequences 5 6 and 5 6 7 the p^ values are respectively 0.52 and 

0.682. The figures thus obtained for the different sequences agree well with our 
intuition as far as their degree of arcness (curvature) is concerned. Also note that the 

sequences like 5 5 6 6 7 7 and 555666777 have the same p.^ 
values as obtained with the sequence 5 6 7. Similarly, the sequences 5 5 6 6 and 5 6 
have the same p^. value. 

Example 3 

To explain the aforesaid features, let us consider the Fig. 15 showing a two-tone 
contour of nuclear pattern of brain neurosecretory cells [25]. The string descriptions 
of Fig. 15 in terms of arcs (of different arcness) and lines are shown below. 
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11 , 187 , 7881 , 112233 , 321 , 11 , 112 , 223 , 334 , 44 , 445 , 543 , 3345 , 55 , 56 , 667 , 
7665 , 55 , 5667 , 78 , 776 , 66 , 678 , 11 

LY68Y64V.68Y64Ly 4 9V.5iV4 9 LV5 i y 6 gV 49 LV 52 V 5i y 64 LV 64 V 32 V 49 LV 68 L 

Here, L, V, and V denote the straight line, 'clockwise arc' and 'anticlockwise arc’ 
respectively. The suffix of V represents the degree of arcness of the arc V. The 
positions of segmentation are shown by a comma (,). 

It is to be mentioned here that the approach adopted here to define and to extract 
arcs with varying grades of membership is not the only way of doing this. One may 
change the procedure so as to result in segments with membership values different 
from those mentioned here. 

FUZZY SYNTACTIC ANALYSIS 

The syntactic approach to pattern recognition involves the representation of a 
pattern by a string of concatenated subpattems called primitives. These primitives 
are considered to be the terminal alphabets of a formal grammar whose language is 
the set of patterns belonging to the same class. Recognition therefor involves a 
parsing of the string. 

The syntactic approach has incorporated the concept of fuzzy sets at two levels. 
First, the pattern primitives are themselves considered to be labels of fuzzy sets, i.e., 
such subpattems as 'almost circular arcs', 'gentle', 'fair' and 'sharp' curves are 
considered. Secondly, the structural relations among the subpattems may be fuzzy, 
so that the formal grammar is fuzzified by the weighted production rules and the grade 
of membership of a suing is obtained by min-max composition of the grades of the 
production used in the derivations. Inference of a fuzzy grammar is another 
interesting problem which infers from the specified fuzzy language, the productions 
as well as the weights of these rules. In this section we will be explaining the 
elementary notions of fuzzy grammar with examples. 

The formal definition of a fuzzy grammar is as follows: 

Definition: A fuzzy grammar FG is a 6-tuple 
FG = (V N ,V T ,P, S, J, r) 

where 

V N : a set of non-terminals (which are essentially labels for certain 

* 

fuzzy subsets called fuzzy syntactic categories of V T ) 

Vj : a set of terminals, such that VjsfDVj = 0 (null set) 

P : a set of production (or rewriting) rules of the type a -» P (a is 

replaced by P), where a > 3 e ( YnU Vt ) * 

S : a starting symbol, such that S e Vjsr 

J : { r i| i = I’ 2 • • • n, n = cardinality of pj^ is the set of labels for 

the production rules 
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: a mapping p:J— > [0,1] such that p(r.) denotes the membership in 


P of the rule labelled r. 


: the set of finite strings obtained by the concatenation of elements 
of V™ 


(V N UV T ) 


: the set of finite strings obtained by the concatenation of element 

of v N uv T 

A fuzzy grammar FG generates a fuzzy language L(FG) as follows: 

$ 

A string X e Vj is said to be in L(FG) iff it is derivable from S and its grade of 
membership M-L(FG) W in L(FG) is > 0, where 


t*MFG)(X)- 1 S i | 


min p(r^) 
l<i<l k v 1 ' 


(59) 


where m in the number of derivations that X has in FG; l k is the length of the kth 

v 

derivation chain, k = l(l)m; and rj is the label of the ith production used in the kth 
derivation chain, i = 1, 2 l k . 

Clearly, if a production a -» P be visualized as a chain link of strength p(r), r 

being the label of a -» P , then the strength of a derivation chain is the strength of its 
weakest link, and hence 

Pl(fg)(X) = strength of the strongest derivation chain from 
S to X for all X e vj 

Example 4: Suppose FGj=({A, B, S), {a, b}, P, S, {1, 2, 3, 4), p) where J, P and 
pare as follows 

1: S -» AB with p(l) = 0.8 

2: S aSb p(2) = 0.2 

3: A -» a p(3) = 1 

4: B -* b p(4) = 1 

Clearly, the fuzzy language generated is FLj= {X I X = a n b n , n = 1, 2, . . .) 

with M^FLj (a n b n ) = 0.8 ifn = l 
= 0.2 if n > 1 

Example 5: Consider the fuzzy grammar FG 2 =({S,A,B},{a,b,c},P,S,J,p) 
where J, P and p are as follows 


‘1 


S -» aA 
A-» bB 
B c 


4( r l) = fi H (a) 
p(r 2 ) = p v ( b ) 

lt( r 3 ) = lt a b (c) 
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the primitives a, b, c being 'horizontal', 'vertical' and 'oblique' directed line segments 
respectively; the terms 'horizontal; 'vertical' and 'oblique' are taken to be fuzzy with 
membership values p H , Py and p ob respectively as defined in the previous section. 

Further, the concatenation considered is of the 'head-tail' type. Hence the only 
string generated is X = abc which is in reality a triangle having membership 

M-L(FG 2 )( abc ) = min (l i H(a).liv(b).Hob(c)) 

which attains its maximum value 1 when abc is an isosceles right triangle. Thus 
L(FG 2 ) is the fuzzy set of isosceles right triangles. 

The membership of a pattern triangle given in Fig. 16b is 
min (1.0, 1.0, 0.66)=0.66 



Figure 16 (a) Primitive (b) Production of Triangle and Letter B. 

Example 6: Consider the following fuzzy grammar for generating the fuzzy set 
representing the English upper case letter B 
V N = {S, A, B, C, D) 

V T ={a,b} 

where the primitive a denotes a directed 'vertical' (fuzzy) line segment and b denotes a 
directed arc (clockwise). The concatenation considered here is again of the 'head-tail' 
type. 

Also J, P and p are as follows 


r l: 

S -> aB 

= lV a ) 

r 2: 

B-> aC 

p(r 2 ) = p v (a) 

r 3: 

C -» bD 

p(r 3 ) = p cir (b) 

r 4: 

D -» b 

p(r 4 ) = P C ir( b ) 


Hie string generated is X=aabb having the following membership in set B. 
p B (X) = min (p v (a)j, P v (a) u , 

where the suffices l and u denote the locations ('lower' and 'upper') of the primitives 
a and b. 

For the pattern given in the Fig. 16b 

MyOOu = 4y(a)j = 0.83, M-Qj-( a ) u = 0-36, M-^j r ( a )j = 0.5 
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so that (Xg(X) = 0.36 

DePalma and Yau [26] introduced the concept of fractionally fuzzy grammars 
with a view to tackling some of the drawbacks of fuzzy grammars which make them 
unsuitable for use in pattern recognition problems. Some of these drawbacks are as 
follows: 

(i) Memory requirements are greatly increased when fuzzy grammars are 
implemented with the help of parsing algorithms that require backtracking. This is 
because when fuzzy grammars are being used, it is not sufficient to keep track of the 
current derivation tree alone. The fuzzy value at each preceding step must also be 
simultaneously remembered at each node, in case back-tracking in needed at some 
step. 

(ii) All strings in the language L(FG) generated by a fuzzy grammar FG can be 
classified into a finite number of subsets by their membership in the language. The 
number of such subsets is strictly limited by the number of productions in FG. 

Definition: A fractionally fuzzy grammar (FFG) is a 7-tuple 

FFG =(V N , V T , P, S, J, g, h) 

where Vj^, V-p P, S, J are as before, and g and h are mappings from J into the set of 

non-negative integers such that g(r) S h(r) for all r e J . Various applications of 
fuzzy and fractionally fuzzy grammars are available in [13,25,27], 

The incorporation of the element of fuzziness in defining 'sharp', 'fair' and 
'gende' curves in the grammars enables one to work with a much smaller number of 
primitives. By introducing fuzziness in the physical relations among the primitives, 
it was also possible to use the same set of production rules and non-terminal at each 
stage. This is expected to reduce, to some extent, the time required for parsing in the 
sense that parsing needs to be done only once at each stage, unlike the case of the 
non-fuzzy approach [28], where each string has to be parsed more than once, in 
general, at each stage. However, this merit has to be balanced against the fact that 
the fuzzy grammars are not as simple as the corresponding nonfuzzy grammars. 
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1 INTRODUCTION 

Natural language is one of the most complicated structures a man 
has met with. It plays a fundamental role not only in human communication 
but even in human way of thinking and regarding the world. Therefore, it 
is extremely important to study it in all its respects. Much has been done in 
understanding its structure, especially the phonetic and syntactic aspects. Less, 
however, is understood its semantics. There are many linguistic systems, often 
based on set theory and logic, attempting to grasp (at least some phenomena) 
of the natural language. However, none them is fully acceptted and satisfactory 
in all respects. 

A serious obstacle on the way to this goal is, besides the complexity 
mentioned, also the vagueness of the meaning of separate lexical units as well 
as of the sentences and longer text. On the other side, the capability of human 
mind to take vagueness into account and to handle it, which is reflected in the 
semantics of natural language, is the main cause of the extreme power of natural 
language to convey relevant and succint information. There is no way out than 
to cope with the vagueness in the models of natural language semantics. 

Fuzzy set theory is a mathematical theory whose program is to provide 
us with methods and tools which may make us possible to grasp vague phenom- 
ena instrumen tally. Therefore, it seems to be appropriate for using in modelling 
of natural language semantics. 

In this paper, we provide the reader with an overview of the main 
results obtained so far in processing of some phenomena of the semantics of 
natural language using fuzzy sets. 



186 


2 FUZZY SETS 

In this section, we briefly touch the notion of a fuzzy set, especially 
those aspects which are important for our further explanation. Words and more 
complex syntagms 1 of natural language can in general be considered as names 
of properties encountered by a man in the world. 

An object is a phenomenon to which we concede its individuality keep- 
ing it together and separating it from the other phenomena. Objects are usually 
accompanied by properties. In general, however, the same property accompa- 
nies more objects. If all sudi objects are grouped together then they can be 
seen as one, new object of a special kind. A grouping of objects being seen as 
an object is called a class. Hence, if tp is a property then there is a class X of 
objects s having <p. In symbols we can write 

*-{*»¥>(*)}• ( 1 ) 

If the property <p is simple and sharp then the class X forms a set. 
However, most properties a man meets in the world are not of this kind. Then 
the dans X is not separated sharply, i.e. there is no way how to name or 
imagine all the objects x from X without any doubt whether a given object x 
has the property tp, or not. Thus, we encountered the phenomenon of vagueness. 
The above mentioned doubt, which probably stems from the inner, still not 
understood, complexity of tp is a core of the phenomenon of vagueness bring 
encountered. Classical mathematics has no other possibility than to model the 
grouping X using (sharp) sets. Therefore, the result cannot be satisfactory 
from the very beginning. Unlike dassical set theory, fuzzy set theory attempts 
at finding a more suitable model of the dass (1). 

Let us take the objects x from some suffidently big set U called the 
universe. Note that this assumption is not restrictive since such a set always 
exist. Fbr example, consider the property tp :=to be a small number 2 . Then 
there surely exists a number z € N which is not small (e.g. z * 2 10 and we may 
put U at {* € N\ x < z}). 

Our doubt whether an object x € U has the given property <p can be 
expressed by means of a certain scale L having the smallest 0 and greatest 1 
elements, respectively. Thus, 1 expresses that tp(x) (x has the property tp) with 
no doubt while 0 means that tp(x) does not hold at all. We obtain a function 

A : U — ♦ L (2) 

assigning an element 

Ax eL 

1 A •jrntagm i* mjr part of » leotsnce (er«n a word or a whok fontonco) that is constructed 
according to tin gramatica! rules. 

* A natural number, for simplicity. 
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from the scale L to each element 


x € t/. 


This function serves ns as & certain characterization of the class X in (1) and 
it is called the fuzzy set We can view the fuzzy set A as a set 

{Axfx\x € If}. (3) 


The element 


Ax £ L 


is called the membership degree of x and thns (2) is often called the membership 
function of the fuzzy set A. One can see that a fnzzy set is identified with its 
membership function. If A is a fnzzy set (3) in the universe U then we write 
AC U. The scale L is usually put to be L =< 0, 1 > and it is assumed to form 
the structure 

C *« 0, 1 >,V, A,®, —♦,(), 1 > (4) 

where V and A are the operations of tupremum (maximum) and infimum (min- 
imum) respectively, ® is the operation of told product defined by 


a®6sOA(a + 6— 1) 


and — ► is the operation of retiduum defined by 


«-*ialA(l-a + i) 


for all the a,b €< 0, 1 >. 

There are deep reasons for the choice of this structure. The reader may 
find them in (13,16). The operations with fuzzy sets form the structure 4. The 
basic ones are 
union 

C » A U B iff Cx m Ax V Bx 


interieetion 


C = A n B iff Cx ■* Ax A Bx 


told intersection 


C sb An B iff Cx a* Ax ® Bx 


residuum 

C sb A QB iff Cx = Ax -* Bx 


On the basis of residuum, one can define the complement A = A @0 9 
where 0 is the empty fuzzy set 

0 m {0/x;x € U} 

*Thi» definition give* Am at 1 — As for all * € V which is the usual definition of the 
complement. 
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In modelling of natural language semantics it is necessary to introduce 
also new, additional operations. Put 

a b — (a 6) A (b — * a) 

(biresiduation) and 

a* a a ® ... ® a 

v V "■ * 

p-times 

(power) for all the a , b €< 0, 1 >. 

When introducing a new n — ary operation o on L, the following fitting 
condition must be fulfilled: there are p \, . . . ,p» such that 

(oi ~ hiY 1 ®...®(a» «-♦ bnY* < o(ai,...,a») «-► o(6i,. ..,&•) (5) 

holds for every aj, € L, i * 1, . . . , n. The justification of the fitting condition 
can be found in (16,15). Note that all the basic operations fulfil the fitting 
condition. Moreover, the folowing holds true: 

Theorem 1 All the the operationi derived from the operation t fulfilling the 
fitting condition fulfil it, at well 

Proof - see (16). 


product 

bounded turn 


The following operations are fitting: 

a.( 


a$i> 1 A (a + 6) 


concentration 
dilation * 


CON(a) = a 3 
DIL(a) * 2a — a 3 


intensification 



2a 3 

1 - 2(1 - a) 3 


a €< 0,0.5 > 
a € (0.5, 1 > 


for all the a, h €< 0, 1 >. 

4 The widely need operation of dilation DIL(o) = 
vied. 


a 01 it not fittig ind thnv it cannot bo 
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The operations in L lead to the operations with fuzzy sets as follows. 
Let 

o : L* — ♦ L 

and A\ , . . . , A* C U be fuzzy sets. Then o is a basis of the operation O assigning 
a fuzzy set C C U to Ai, . . . , A m when we put 

C = 0(A„...,A,) iff C* = o(Ai*»,...,A,*.) (6) 

for every x 6 U. 

For example, we can define the operation of bounded sum of fuzzy sets 
by putting 

CaAWB iff Cx m Ax © Bx 

for every x € U. 

A very important notion is that of a fuzzy cardinality of a fuzzy set. 
There are several kinds of them [13,22]. We will use the following ones defined 
for fuzzy sets with finite support: Aisolute fuzzy cardinality of A C Ut 

FCard(A) *e {or„/n;n € N} (7) 


where 

= \/{/? 5 Card (Ay) = »} 

and Ay is a 0 cut of A. Relative fuzzy cardinality of A with respect to £ where 
A,£C U: 

FCard,i(B) * {o r /r;r € Re} (8) 


where 


Card(Aj n B^) 
VtP ’ Card(A^) 


r] 


The notions introduced above will be used in the sequeL For other 
notions and operations with fuzzy sets see e.g. [13,4]. 


3 THE USE OF FUZZY SETS IN MODELLING OF 
NATURAL LANGUAGE SEMANTICS 

3.1 The general representation of the meaning 

Let us turn our attention to the problem of grasping of natural language 
semantics. A sentence of natural language can be viewed from several points. 
In classical linguistic, it is usual to talk about representation of a sentence on 
various levels. In the system called the functional generative deieription of 
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natural language (FGD),(see [18]) five levels are differetiated, namely phonetic 
(PH) (how a sentence is composed as a system of sounds), phonemic (PM) 
(how words of a sentence are composed), morphemic (ME) (how a sentence is 
composed of its words), turface syntax (S3) (the system of grammatical rales) 
and tectogrammatical (TR) which is the highest level corresponding to the se- 
mantics. The latter is also called the deep structure of the sentence and this 
structure is the objective of possible application of fuzzy set theory. As has al- 
ready been stated, words and more complex syntagma of natural language can 
be understood to be names of properties enocountered by a man in the world. 
In the light of the previous section, fuzzy sets can be used as follows: let A be 
a syntagm of natural language and ip the corresponding property. If the class 
(1) determined by ip is approximated by a fuzzy set A C U then the meaning 
Ilf (A) of A is 

M(A) = A. (9) 

Thus, our job consists in determining of the membership function A. 

However, the situation is by no means simple as not every word of 
natural language coresponds to such a property and, above all, there are various 
relations between words. Thus, determination of the membership function which 
corresponds to a complex syntagm may be a very complicated task. 

On the tectogrammatical level, a meaning of a sentence is represented as 
a complex dependency structure which can be depicted in the form of a labelled 
graph. For example, the sentence 

Peter writes a short letter to his friend 

can be depicted in the form of a graph on Fig. 1. 

For the detailed explanation of this graph see [18,17]. The letters t and / 
mean topic and focus, respectively. A topic is a part of a sentence containing the 
theme which is spoken about and focus contains a new information conveyed 
by the sentence. Of course, one surface structure of a sentence may lead to 
several deep structures. Up to now, we are far from the detailed understanding 
to all the nuances of the sentence semantics. The present state of art makes 
us possible to model the meaning of only some simple syntagma, i.e. certain 
branches of the tectogrammatical tree such as that on Fig. 1. This will be 
disscused in the subsequent sections. 


S.3 Fuzzy semantics of selected syntagma 

First, let us stop at the modelling of the semantics of nouns. In general, 
if S is a noun, then its meaning is a fuzzy set 

M(S) = S,SCU. 



(Peter,!) (Friend,!) 


(Letter J) 


Appurt 


Gener 


(He,() (Short J) 


Figure 1: The tectogrammatical tree of the sentence Pettr writes a short letter 
to hit friend 


Wh&t is the universe U1 It is n set of objects chosen in snch & way that whenever 
an object x has the property <p$ named by the nonn S then x € U. This can 
be constructed e.g. as follows: Let K be a set of generic elements called the 
kernel space. For example, K can be a union of all the objects described in our 
dictionary, of those we have regarded during the last week, of those we see in 
our flat etc. In short, K should contain all the specific objects we have met or 
imagine. Let T(K) be a set of all the fuzzy sets on K and put 

/■*(!£) = . . {F(K) . . .) 

' V * 

■-times 

Let Ek be the smallest set closed with respect to all the Cartesian 
powers of K, of n » l,...,n and all the Cartesian products of these 

elements. This set is called the semantic space. Then the universe of 5 is a 
sufficiently big subset V C Ek. 

A certain problem is the determination of the membership function. 
There are several methods proposed in the literature (cf. [13]). The member- 
ship function corresponding to the object nouns (e.g. taile, ear, donkey etc.) 
could be constructed on the basis of the outer characteristics of elements. For 
example, we may use proportions of some geometric patterns contained in ob- 
jects etc. A very often used method is statistical analysis of expert (subjective) 
estimations. Several experiments have been described in the literature. Let us 
mention that fuzzy methods are rather robust and thus exact determination of 
the membership function is not as important as it might seem at first glance. 
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The experience suggests that even individual estimation works well when it is 
done carefully and seriously. 

In practical applications, e.g. in artificial intelligence, it is not very 
useful to model the meaning of nouns because we would have to find proper 
representation of its elements in the computer which, in fact, we do not need.The 
most successful applications are based on modelling of the meaning of adjectives 
and the syntagms of the form 

(fuantifier —) adverb — adjective ( — noun) 

where the syntagm 

adverb — adjective (10) 

plays the crucial role. The most important (and very freequent) adjectives are 
those inducing an ordering < in the universe U. We will assume that < is 
linear. According to lingustitic considerations as well as experiments (cf. [10]), 
there are certain points m, s, v, € U where m < » < v. The point s is called 
the semantic center. The adjectives inducing an ordering in U usually form 
antonyms which can be characterized as follows. Let A~,A + be antonyms (e.g. 
small - big, cold - hot etc.). Then their meanings are fuzzy sets 

M(A ~ ) — A~ 

M(A+) m A+ 

such that SuppA - C< m,s) and SuppA + C (s,v > where SuppA =* {* € 
U;Ax > 0}. 8 We will often call A~ a negative and A + a positive adjective, 
respectively. There are also couples of antonyms such that a third member A 0 
exist. Its meaning is 

M(A°) m A° 

where the membership function A° has the property 

s € KerA° ** {* € U;A°x — 1}*. 

A typical example of the adjective A 0 is A°i*average. In the sequel we will call 
A° a zero adjective. 

The curves corresponding to the fuzzy sets A”, A + , A° have character- 
istic shapes depicted on Fig. 2. Note that they are sometimes called the 5“, S + 
and II fuzzy sets, respectively. 

*Thi* set is called the tvppori of the lossy cet A. 

*Thie eet is called the kernel of the lossy set A*. 
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Figure 2: The membership functions corresponding to the meaning of the ne- 
gative, positive and zero syntagma. 
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A general formula for all the three fuzzy sets ia the following: 

’0 if x < <»i or x > o 3 

1 if Ci < x < C 3 

i(j^) if 01 <x<bi 

F(x, a„ b lt c lt c 3 , 6 3 , a,) = < 1 “ 2 (f^) < * < d 

1 ~ 3 (^r) ifc 3 <x<6 3 

,*(«)* Xh<*<*2 

ae The meaning of the points ai,6i,ei,a 3 ,6 3 ,e 3 € U is clear from Fig. 2. 

The adverb in (10) ia an intensifying one (e.g. very, highly, absolutely, 
slightly etc.) and it is nanally called the linguistic modifier in fuzzy set theory. 
In general, the meaning of the intensifying adverb m is a pair of functions 

M(m) =< > 

where Cm • U — ► U is a displacement function and u m : L — ► L is a unary 
operation fitting L 7 . Hence, the meaning of the syntagm (10) is obtained using 
the composition of functions 

M(mA) « v m oAo( m . (11) 

A typical example, widely used in fuzzy set theory, is the modifier very 
defined as follows: 

- CON(a), o €< 0, 1 > 

and 

<..„(*) »* + (-!)* II Ker(A) II 

where Jfc = 1 for A + or for A 0 if x < s, and * « 2 for A~ or for A 0 if x > s. 
|| Ker(A) || is the length of the interval < inf(Ker(A)),sup(Ker(A)) >. The 
parameter d was experimentally estimated to be a number i €< 0.25, 0.40, >. 

Examples of some other linguistic modifiers can be found in the litera- 
ture. 

The meaning of verbs is a very complicated problem and so far, only 
the copula "to be” in syntagma as 

p := V is A (12) 

7 Some authors simplify this model by putting Cm = idp (an identical function on V). 
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is modelled where A is usually the syntagm (10). However, (12) is interpreted 
rather as a simple assignement than a verb. The V in (12) is a nonn but it is 
usually not treated as such. Thus, we obtain two ways how the meaning of (12) 
can be modelled. 

a) We put 

M(p) m M(A) = A, 

i.e. the meaning of p is set equal to the meaning of the syntagm A. This is quite 
reasonable since, as was stated above, we usually need not know the meaning 
of the noun V in the applications. 

b) Let M(V) « P CV. Then we put 

M(p) C PxA (13) 

where P x A is the Cartesian product of fuzzy sets defined by 

(P x A) < x,y >« Px A Ay. 

The inclusion in (13) may be proper or improper dependingly on the 
kind of the noun V. The rdation (13) means that each element x from the 
universe V (a representative of the noun V) is asigned an attribute y from the 
universe U where ACU *. 

As we are in the frizzy environment it seems reasonable to take the 
resulting membership degree of the couple < x, y > as minimum of Px and Ay. 

L. A. Zadeh [22,21] and some other authors following him suggest to 
interpret the membership degree (P x A) < x, y > as a pouibility degree of the 
fact that x is A. However, the possibility degree concerns uncertainty which, 
in our oppinion, does not reflect the vagueness phenomenon contained in the 
semantics of natural language. 

Very important are the conditional sentences of the form 

C :ss IF p THEN q (14) 

where p and q are syntagma of the form (12). In fuzzy set theory we usually put 

M(C) m M(p)QM(q), (15) 

i.e. we interpret the implication (14) using the residuum operation between 
fuzzy sets M(p) and M(q). 

The interpretation of the conditional sentences (14) plays the crucial 
role in the so called approximate reasoning which is one of the most successfully 

*U is neually the real line. For example, in the syntagm* such a* Peter ie felt, the e l eme n t 
Peter is assigned its height being a real number. 
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applied areas of fuzzy set theory (see e.g. (8,9]). Let as remark that many 
authors interpret (14) as the Cartesian product 

M(C) = M(p) x M(q). (16) 

In the applications of the approximate reasoning, this may work since all the 
fuzzy methods are very robust. However, putting the meaning of the implication 
(14) equal to the Cartesian product is linguistically as well as logically incorrect 
since (16) h symmetric and the implication is not. Another reason why (16) 
often works in the practical applications may be the fact that the implication 
(14) often describes only some kind of a relation between the input and output 
and it is not, in fact, understood to be the implication. This discrepancy needs 
still more analysis. 

Let us also mention the problem of linguistic quantifiers (we will denote 
them by the letter Q). They do not form a uniform group form the linguistic 
point of view. We place among them numerals induding the indefinite ones (e.g. 
severs/), some adverbs (e.g. many, few, meet), some pronouns (e.g. every), some 
nouns (e.g. majority, minority) and others. From the point of fuzzy set theory, 
their meaning generally is a fuzzy number, i.e. a fuzzy set M(Q) *x Q C Re in 
the real line. It is proposed in (22] how to interpret the syntagma of the form 

QA (17) 

or 

QA'o are S'*. (18) 

In (17), the quantifier Q. is interpreted as a fuzzy characterization of the absolute 
fuzzy cardinality (7) of the fuzzy set 

A * M(A) 

while in (18), it characterizes the relative fuzzy cardinality (8) of M(A) with 
respect to B ■ M(B). More exactly, we put 

M(QA) m Q n PCard(A) (19) 

and 

M(QA'e are S'«) mQf\ FCard A (B). (20) 

However, the problem is not finished yet since (19) and (20) have sense only for 
fuzzy sets with finite support. 

In the literature on fuzzy set theory (see e.g. (19,20,22,13] and others), 
one may find the semantics of the compound syntagma of the form 


A and B 


( 21 ) 
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and 

AorB ( 22 ) 

defined using the operations of intersection and union of fuzzy sets, respectively. 
However, this is only a tentative solution since the syntagma (21) and (22) are 
special cases of the very complicated phenomenon known in linguistics as the 
coordination. The use of the operations of intersection and union of fuzzy sets 
may work in some special cases of the dose coordination in syntagma e.g. Peter 
and Paul ..., Old and dirtf ear ... etc. 

Even worse situation is encountered with negation. The simple use 
of the operation of complement of fuzzy sets works only with some kinds of 
adjectives and nouns. However, negation contained in more complex syntagms 
is inddental to the phenomenon of the topic-Jocus articulation when only focus is 
being negated. Fully comprehensive description of this phnomenon in linguistics 
is not, however, still done. 

4 FEW COMMENTS TO THE APPLICATIONS AND 
LINGUISTIC APPROXIMATION 

The theory presented so far has found many interesting applications, 
espedaly in the modds connected with the so called approximate reasoning. 
However, we are quite far from grasping of the semantics of natural language 
more comprehensively and much work has still to be done. 

A very important concept which deserves to be mentioned here is that 
of a linguistic variable (20). This concept made us possible to see a certain part 
of linguistics from a more technical point of view. 

A linguistic variable is, in general, a quintuple 

< X,T(X),U,G,M > 

where A* is a name of the variable, T(X) is its term-set, U is the universe, G 
the syntactic and M the semantic rules, respectively. Fbr example, 

X :■ size, 

U :»< 0, 1000 >, 

G is a certain, usually context-free grammar generating the set T(X) of terms 
such as big, small, vcrg big, rather average etc. and M is the syntactic rule 
assigning to each term A € T(X) its meaning being a fuzzy set 

M(A)CU. 

Lingustic variables play important role in applications. For example, the pa- 
rameters of a technical system such as temperature, speed, weight etc. can be 
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understood to be lingnstic variables. As its values can be also crisp, e.g . exactly 
ISS.t etc., the concept of a linguistic variable is general enough to capture also 
the classical concept of the variable. 

In applications, one may also meet the problem of a linguistic aproxi- 
maiio n. We may lay it down as follows. Let T be a set of syntagma of natural 
language e.g. the term-set of a linguistic variable and let as be given a fuzzy 
set A{, C U. Our task is to find a syntagm Ao € T such that its meaning 
M(Ao) a* Ao is as dose to Ag as possible. 

There are many ways how to solve this task. However, no sufficiently 
efficient and general method is known till now. One of the possible procedures 
is the following. 

Let / :< 0,1 > — ►< 0,1 > be a smooth, increasing, and measurable 
function. Put 


R < A,B >m l 


Js%n(A)USM Pr (B) 4f( Ax - Bx ) 
fs* pp (A) <y(A*) + / 5 , w ( B ) df(Bx) 


(23) 


Then we may find Ao € T such that 

Jl < Ao, Ao > 


is maximal one. 

If T has small number of elements then we may also find Ao such that 

2 I Ao*-A£*| p (24) 

«€V 

is minimal for some suitable, previsously set number p. This method is often 
used in thecnical applications. 

A quite effective procedure was proposed by P. Esragh and E. H. Mam- 
dani in [5]. This procedure is suitable for the syntagma of the form (21) and (22) 
where A and B may consist of an adjective, a noun and a linguistic modifier, 
and the universe U is ordered. According to this method, the membership func- 
tion Ao is divided into parts by the effective turning points (i.e. special points 
where the membership function changes its course), the parts are approximated 
by the above partial syntagma A, B using (24) and the resulting syntagm is 
obtained by joining A and B using the corresponding connective. In particular, 
if the two neighbouring parts of the membership function form a "hill* then the 
corresponding syntagma are joined by the connective and, and if they form a 
"valley” then they are joined by the connective or. Note that this works only 
in the case when the connective and is interpreted as the intersection and or as 
the union of fuzzy sets. 



199 


5 CONCLUSION 

We have briefly presented the main ideas of the modelling of natural 
language semantics using fuzzy set theory. We attempted to demonstrate how 
the semantics of some of the baric units can be interpreted, namely the semantics 
of nouns, adjectives, selected adverbs and the copula ”to be”. Moreover, the 
semantics of some cases of the dose coordination (the use of connectives) was 
also touched along with the semantics of conditional sentences. Let us stress 
that we are still far from grasping of the meaning of more complex syntagma, 
and even simple clauses when they contain a verb. The reason consists in an 
extreme complexity of verb semantics since verbs represent the most important 
units of our language stepping towards the human's recording the surrounding 
world on the highest level of his intellectual capability. 

Some work in this respect is done in (14] where, however, the new world 
of mathematics called the alternative set theory (AST) is used. Frizzy set theory 
serves there as a special technical tod which is used at a second stage after the 
semantics of a sentence (syntagm) in the frame of AST is formed. 

Despite the above facts, the use of fuzzy sets in modelling of natural 
language semantics has already found many successful applications. This is a 
convincing argument in favour of the usefullness of fuzzy set theory. 
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INTRODUCTION 

According to Harre (1972) there are two major purposes of models in 
science: 1) logical, which enables to make certain inferences which would not 
otherwise be possible to be made; and 2) epistemiological, to express and extend 
our knowledge of the world. Models are helpful for explanation and theory 
formation, as well as simplication and concretization. Zimmermann (1980) 
classifies models into three groups: 1) formal models (purely axiomatic systems 
with purely fictitious hypotheses), 2) factual models (conclusions from the models 
have a bearing on reality and they have to be verified by empirical evidence), and 3) 
prescriptive models (which postulate rules according to which people should 
behave). The quality of a model depends on the properties of the model and the 
functions for which the model is designed (Zimmermann, 1980). In general, good 
models must have three major properties: 1) formal consistency (all conclusions 
follow from the hypothesis), 2) usefulness, and 3) efficiency (the model should 
fulfill the desired function at a minimum effort, time and cost). 

Although the usefulness of the mathematical language for modeling 
purposes is undisputed, there are limits of the possibility of using the classical 
mathematical language which is based on the dichotomous character of set theory 
(Zimmermann, 1980). Such restriction applies especially to the man-machine 
systems. This is due to vagueness of the natural language, and the fact that in 
empirical research natural language cannot be substituted by formal languages. 
Formal languages are rather simple and poor, and are useful only for specific 
purposes. Mathematics and logic as research languages widely applied today in 
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natural sciences and engineering are not very useful for modeling purposes in 
behavioral sciences and especially in human factors studies. Rather, a new 
methodology, based on the theory of fuzzy sets and systems is needed to account 
for the ever present fuzziness of man-machine systems. 

As suggested by Smithson (1982), the potential advantages for 
applications of a fuzzy approach in human sciences are: 1) fuzziness, itself, may be 
a useful metaphor or model for human language and categorizing processes, and 2) 
fuzzy mathematics may be able to augment conventional statistical techniques in 
the analysis of fuzzy data. Fuzzy methods are useful supplements for statistical 
techniques such as reliability analysis and regressions, and structurally oriented 
methods such as hierarchical clustering and multidimensional scaling. 

HUMAN FACTORS 

Human factors discipline is concerned with "the consideration of human 
characteristics, expectations, and behaviors in the design of the things people use 
in their work and everyday lives and of the environments in which they work and 
live" (McCormick, 1970). The "things" that are designed are complex man- 
machine systems. According to Pew and Baron (1983) the ultimate reasons for 
building models in general, and man-machine models in particular, are to provide 
for 

1. A systematic framework that reduces the memory load of the 
investigator, and prompts him not to overlook the important 
features of the problem, 

2. A basis for extrapolating from the information given to draw new 
insights and new testable or observable inferences about system or 
component behavior, 

3. A system design tool that permits the generation of design 
solutions directly, 

4. An embodiment of concepts or derived parameters that are useful as 
measures of performance in the simulated or real environment, 

5. A system component to be used in the operational setting to 
generate behavior, for comparison with the actual operator behavior 
to anticipate a display of needed data, to introduce alternative 
strategies or to monitor operator performance, and 

6. Consideration of otherwise neglected or obscure aspects of the 
problem. 

According to Topmiller (1981), research in man-machine systems poses 
an important methodological challenge. This is due to the complexity of such 
systems, and a need for simultaneous consideration of a variety of interacting 
factors that affect several dimensions of both individual and group performance. 
Chapanis (1959) argues that "we do not have adequate methods for finding out all 
the things we need to know about people. Above all, we need novel and 
imaginative techniques for the study of man. This is an area in which behavioral 
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scientists can learn much from the engineering and physical sciences." 

Research techniques applied in man-machine research typically include the 
following methods: 1) direct observation (operator opinions, activity sampling 
techniques, process analysis, etc.), 2) accident study method (risk analysis, critical- 
incident technique) 3) statistical methods, 4) experimental methods (design of 
experiments), S) psychophysical methods (psychophysical scaling and 
measurement), and 6) articulation testing methods (Chapanis, 1959). Today we are 
still at the beginning stage of building robust mathematical models for the analysis 
of complex human-machine systems. This is partially due to lack of appropriate 
design theory, as well as complexity of human behavior (Topmiller, 1981). The 
human being is too complex a "system" to be fully understood or describable in all 
his/her properties, limits, tolerances, and performance capabilities, and no 
comprehensive mathematical tool has been available up to now to describe and 
integrate all the above mentioned measures and findings about human behavior 
(Bemotat, 1984). 

FUZZY MODELS 

Human work taxonomy can be used to describe five different levels 
ranging from primarily physical tasks to primarily information processing tasks 
(Rohmert, 1979). These are: 

1) producing force (primarily muscular work), 

2) continuously coordinating sensory-monitor functions (like 
assembling or tracking tasks), 

3) converting information into motor actions (e.g. inspection tasks), 

4) converting information into output information (e.g. required control 
tasks), and 

5) producing information (primarily creative work). 

Regardless of the level of human work, three types of fuzziness are 
present and should be accounted for in modeling of man-machine systems, i.e.,: 1) 
fuzziness stemming from our inability to acquire and process adequate amounts of 
information about the behavior of a particular subsystem (or the whole system), 2) 
fuzziness due to vagueness of the relationships between people and their working 
environments, and complexity of the rules and underlying principles related to such 
systems, and finally, 3) fuzziness inherent in human thought processes and 
subjective perceptions of the outside world (Karwowski and Mital, 1986). Figure 
1 illustrates the above thesis. Traditional man-machine interfaces, which include: 
1) information sensing and receiving, 2) information processing, 3) decision- 
making, 4) control actions, and 5) environmental and situational variables, are 
represented in two blocks, i.e., human interpretation block and a complex work 
system block. 
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Figure 1. Fuzziness in man-machine interfacing (after Karwowski 
and Mital, 1986). 


Uncertainty, (looked upon in the context of mental workload) which 
causes unpredictability in one's stimulus and/or response, enters a work situation 
from several sources (Audley et al., 1979). These are: 1) external disturbance 
model, 2) varying parameters of the system structure external to the human 
operator, 3) human produced noise in observing the task stimuli, 4) lack of good 
internal model of the external system, 5) human-produced distortions in 
interpreting the externally stipulated criterion of performance, and 6) human- 
produced motor noise. 

In view of the above, the theory of fuzzy sets offers a useful approach 
when the task demands are vague, with the main advantage being its ability to 
model imprecise task situations and, therefore, a potential to develop a framework 
for implementation of workload measures. 
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FUZZINESS AND HUMAN-MACHINE SYSTEMS 

Man-machine studies aim to optimize work systems with respect to 
physical and psychological characteristics of the users, and investigate complex and 
ill-defined relationships between people, machines, and physical environments. 
The main goal of such investigation is to remove the incompatibilities between 
humans and tasks, and to make the workplace healthy, productive, comfortable and 
satisfying. 

Human-centered systems, which are the objects of man-machine studies, 
are very complex and difficult to analyze. There are at least three different types of 
uncertainty inherent to such systems; i.e., inaccuracy, randomness, and vagueness 
(Bezdek, 1981). Uncertainties due to inaccuracy are related to observations and 
measurements (representations), while those due to randomness (of events) are 
independent from observations and constitute an objective property of some real 
process. Uncertainty due to vagueness (or fuzziness) has to do with the complexity 
of the system under investigation and the human thought and perception processes 
(Zadeh, 1973). 

A new methodology in the area of man-machine is needed to account for 
imprecision and vagueness of such relationships. Zadeh (1974) points out that 
"Although the conventional mathematical techniques have been and will continue 
to be applied to the analysis of humanistic systems, it is clear that the great 
complexity of such systems call for approaches that are significantly different in 
spirit as well as in substance from the traditional methods - methods which are 
highly effective when applied to mechanistic systems, but are far too precise in 
relation to systems in which human behavior plays an important role." 

In the past, most of the traditional methodologies disregarded the system 
complexities, and assumed that the formal properties of mathematics correspond to 
existing relationships characteristic to the system under investigation (Zadeh, 
1974). For example, an uncertainty due to vagueness was often modeled as being 
of stochastic nature. Such treatment appears to defeat the purpose of any formal 
man-machine systems' analysis and modeling efforts. 

The concept of fuzziness 

Fuzziness relates to the specific kind of vagueness having to do with 
gradations in categories, i.e., degree of vagueness (Smithson, 1982). Uncertainty 
measured by fuzziness refers to the gradation of membership of an element in some 
class (category). Although such uncertainty arises at all levels of cognitive 
processes, people have the abilities to understand and utilize vague and imprecise 
concepts which are difficult to analyze within the framework of traditional 
scientific thinking (Hersh et al„ 1976; Kramer, 1983; Karwowski and Mital, 
1986). Therefore, awareness of vagueness and inexactness, implicit in human 
behavior, should be the basis of any man-machine studies. 

According to Zadeh (1965), the theory of fuzzy sets represents an attempt 
for constructing a conceptual framework for a systematic treatment of vagueness 
and uncertainty due to fuzziness in both quantitative and qualitative ways. Such 
framework is much needed in the human-machine interaction area. As pointed out 



206 


by Singleton (1982) "most human characteristics have very complex contextual 
dependencies which are not readily expressible in tabulations of numbers even in 
multivariate equations." Yet, there is growing evidence that people comprehend 
vague concepts, such as concepts of a natural language, as if those concepts were 
represented by fuzzy sets, can manipulate them according to the rules of fuzzy logic 
(Oden, 1977 and Brownell et al., 1978). Recent research in semantic memory and 
concept formation indicates that natural categories are fuzzy sets with no clear 
boundaries separating category members from nonmembers (McCloskey et al., 
1978). One can certainly understand the meaning of such concepts as "excessive 
workload," "low illumination,” "heavy weight," "high level of stress," and "tall 
man," to name a few commonly used descriptors of the human-environment 
relationship. 

As noted by Singleton (1982), "no one has yet developed a comprehensive 
set of crude and approximate but simple and inexpensive techniques finding 
solutions to ergonomics problems.” Fuzzy set theory, which allows interpretation 
and manipulation of imprecise (vague) information and recognition and evaluation 
of uncertainty due to fuzziness (in addition to randomness), may be the closest 
solution to the above stated need available today. 

Conventional versus fuzzy set theory and logic 

In a conventional (classical) set theory, an element x either belongs or 
does not belong to a set X, and the characteristic (membership) function f x can be 
represented as follows: 

{l if x e X (truth value = 1: true) 

|0 if x € X (truth value = 0: false) 

The concept of fuzzy set extends the range of membership values for f x , 
and allows graded membership, usually defined on an interval [0, 1]. 
Consequently, an element may belong to a set with a certain degree of 
membership, not necessarily 0 or 1. The "excluded middle" concept is then 
abandoned, and more flexibility is given in specifying the characteristic function. 
In view of the above, the mathematical logic can also be modified. Interestingly, 
the classical logic was actually extended as early as 1930 by Lukasiewicz, who 
proposed the infinite-valued logic. As stated by Giles (1981), "Lukasiewicz logic 
is exactly appropriate for the formulation of the 'fuzzy set theory' first described by 
Zadeh; indeed, it is not too much to claim that is related to fuzzy set theory exactly 
as classical logic is related to ordinary set theory." 

The theory of fuzzy sets has been successfully applied in the modeling of 
ill-defined systems in a variety of disciplines (cognitive psychology, information 
processing and control, decision-making sciences, biological and medical sciences, 
sociology and linguistics, image processing and pattern recognition, and artificial 
intelligence). 

Willaeys and Malvache (1979) investigated the perception of visual and 
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vestibular information in a "watch and decision" or industrial inspection (control) 
tasks. The imprecise nature of the human problem solving procedures was related 
to the "shaded" strategy of the operator's perception and to the "hard-to-predict" 
environment of the man-machine environment The labels of fuzzy sets used by 
the operator to describe different physical variables of the task woe identified, and 
the fuzzy model of the process-control task was formulated. 

Benson (1982) developed an interactive computer graphics program for 
analytical tasks which are not well defined or utilize imprecise data. Color scales 
were used to model subjectively defined categories under investigation. Such fuzzy 
categories were then presented to the analyst. The use of a linguistic approach 
allowed the identification of membership for different categories of description of 
visual inspection. The perceptual properties of color proved to be useful in 
selective focus attention and in distinguishing or disregarding variations between 
imprecisely defined categories. 

Karwowski and others (1988, 1984a and 1984b) developed a fuzzy set 
based model to assess the acceptability of stresses in manual lifting tasks. 
Measures of acceptability were expressed in terms of membership functions which 
described the degrees to which the combined effect of biomechanical and 
physiological stresses were acceptable to the human operator. The combined 
acceptabilities of a lifting task were similar to the subjective estimations of the 
overall task acceptability established by the subjects in a psychophysical 
experiments. 

Terano et al. (1983) introduced a fuzzy set approach into fault-tree 
analysis, and studied the fuzziness of a human-reliability concept from the man- 
machine systems safety point of view. Kramer and Rohr (1982) developed a fuzzy 
model of driver-behavior based on simulated visual pattern processing in lame 
control. Saaty (1977) distinguished two types of fuzziness in layman perception 
(for example, perception of illumination intensity) and fuzziness in meaning, 
advocating that fuzziness is a basic quality of understanding. Hirsh et al. (1981) 
used a fuzzy dissimilitude relation to describe human vocal patterns. 

FUZZY-SET THEORETIC MODELING 
OF HUMAN-COMPUTER INTERACTION 

The interaction between people and computers reflects the cognitive 
imprecision of the data and uncertainty exhibited in the user's perception of the 
computing environment, including the limitations of the computer software used. 
Since human reasoning is not precise, the human-computer interaction (HCI) 
should be imprecision-tolerant, and should allow for the inexact mode of 
communication (Karwowski et al., 1990). Recent developments in fuzzy 
methodologies, fuzzy computing, and fuzzy hardware (computers based on fuzzy 
logic processing units), created a set of new possibilities for the development of 
vagueness-tolerant human-computer interfaces. 
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Human-computer interaction system 

The hum an -computer interaction system (HCIS) can be formally defined 
(Karwowski et al., 1990) as a quintuple: 


HCIS = (T, U, C, E, I) (1) 

where: T - task requirements (physical and cognitive) 

U - user characteristics (physical and cognitive) 

C - computer characteristics (hardware and software including 
computer interfaces) 

E -an environment 
I - a set of interactions. 

The set of interactions I embodies all possible interactions between T, 
U, C in E regardless of their nature or strength of association. For example, one 
of the possible interactions can relate to the data stored in the computer memory 
and the corresponding knowledge, if any, of the user. The interactions I can be 
elemental, i.e. one to one association, or complex, such as an interaction between 
the user, the particular software used to achieve the desired task, and available 
physical interface with the computer. Also, the elemental interactions do not have 
to directly involve the user. For example, the interaction may involve only T and 
C components. It should be pointed out that the elemental interaction between U 
and C reflects the narrow concept of the traditional human-computer interface. 

In human-computer interaction, the uncertainty and imprecision due to 
vagueness (or fuzziness) stems from the high complexity of human-computer 
systems as well as the nature of computer user's perception and thought processes. 
As pointed out by Zadeh (1973), the key elements in human thinking are linguistic 
descriptors, or labels, of classes of objects with gradation of membership of their 
elements, i.e., fuzzy sets. Furthermore, the human reasoning is approximate rather 
than exact, and is based upon a logical system with fuzzy truths, connectives, and 
fuzzy rales of inference (Lakoff, 1973; Kochen, 1975; Hersh and Caramazza, 1976; 
Mamdani and Gaines, 1981; Schmucker, 1984; Karwowski and Mital, 1986; 
Smithson, 1987). 

Fuzziness in HCI research 

Recently, there have been some initial attempts to incorporate fuzziness 
in the HCI research. Simcox (1984) presented a method to determine compatibility 
functions that describe the degree of implied attribute of the visual display and the 
linguistic category that summarizes values of this attribute. Such compatibility 
functions were postulated to be useful in the construction of the computer graphs 
as a communication mode. Boy and Kuss (1986) have proposed a fuzzy method for 
modeling of human-computer interactions in information retrieval tasks, and 
implemented their method in the computer-based library retrieval system (BIBLIO). 
Recently, Hesketh et al. (1988) developed a computerized method for fuzzy graphic 
rating scale using the FUZRATE program which feeds back to the user his/her 
fuzzy ratings, and then presents the results of combining these ratings. 
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THE GOMS MODEL 

One of the recently proposed models of computer user's information- 
processing is the GOMS concept (Card et al., 1983). According to the GOMS 
model, the user's cognitive structure consists of four components 1) a set of 
Goals, 2) a set of Operators, 3) a set of Methods for achieving the goals, 4) a set 
of Selection Rules for choosing among competing methods for goals. These 
components can be further defined as follows: 

1) Goals : A goal is a symbolic structure that defines a state of affairs to 
be achieved and determines a set of possible methods by which it may 
be accomplished. 

2) Operators.Operators are elementary perceptual, motor, or cognitive 
acts, whose execution is necessary to change any aspect of the user's 
mental state or to affect the task environment 

3) Methods: Methods describe procedures used by the user to accomplish 
a goal. Methods have a chance of success distinctly less than certain, 
because of the user's lack of knowledge or appreciation of the task 
environment This uncertainty is a prime contributor of the problem- 
solving character of a task; its absence is a characteristic of a 
cognitive skill. 

4) Selection Rules : Rules for predicting from knowledge of the task 
environment which of several possible methods will be selected by 
the user in order to accomplish a specific goal. 

In 1983, Card et al. devised a text-editing experiment to show the validity 
of GOMS model. In one experiment subjects were told to perform simple line 
location tasks and the methods that each subject used to locate a line was recorded. 
From sample editing sessions, the methods and the associated selection rules for 
locating a line were inferred. The study concluded that the GOMS knowledge 
representations were valid for such tasks. 


FUZZY GOMS MODELING 

Card et al., (1983) noted possible extensions of the GOMS model. 
Among these were "the assurance that a GOMS description can be given for a 
display oriented editor" and methods for improving the accuracy of the predictions 
of user's actions. It was also suggested that "the probabilistic selection rules and 
conditionalities for predicting which method the user will employ and for 
expressing probabilistic conditionality within those methods" be explored. 

Another enhancement to the GOMS model would be the ability to 
account for uncertainty within selection rules (Karwowski et al., 1989). The 
original GOMS study inferred, from user behavior, rules such as "If the number of 
lines to the next modification is less than 3 then use the LF-METHOD; else use 
the QS-METHOD." This type of rule assumes perfect knowledge and absolute 
certainty of the user's cognitive ability to observe, at a glance, the number of lines 
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to the next change. 

In order to account for natural fuzziness of the above human-computer 
interactions, the GOMS model was recently extended by allowing its components 
to assume precise, probabilistic or fuzzy values. Such preliminary generalization 
of the GOMS model proposed by Karwowski et al. (1990) is given in Table 1. 
The Goals, Operators and Methods components can either be precise or fuzzy, 
while the Selection Rules are expressed in either probabilistic or fuzzy (as 
possibilistic or linguistic inexactness) manner. 


Table 1. Generalized computer user's cognitive structure based on GOMS 

model and nature of model components (after Karwowski et al, 1990). 


Structure 

category 

Goals 

(description) 

Operators 
(nature of acts) 

Methods 

(description) 

Selection rules 

(reasoning 

processes) 

1 

Precise 

Precise 

Precise 

Probabilistic 

2 

Precise 

Precise 

Precise 

Fuzzy 

3 

Precise 

Precise 

Fuzzy 

Probabilistic 

4 

Precise 

Precise 

Fuzzy 

Fuzzy 

5 

Precise 

Fuzzy 

Precise 

Probabilistic 

6 

Precise 

Fuzzy 

Precise 

Fuzzy 

7 

Precise 

Fuzzy 

Fuzzy 

Probabilistic 

8 

Precise 

Fuzzy 

Fuzzy 

Fuzzy 

9 

Fuzzy 

Precise 

Precise 

Probabilistic 

10 

Fuzzy 

Precise 

Precise 

Fuzzy 

11 

Fuzzy 

Precise 

Fuzzy 

Probabilistic 

12 

Fuzzy 

Precise 

Fuzzy 

Fuzzy 

13 

Fuzzy 

Fuzzy 

Precise 

Probabilistic 

14 

Fuzzy 

Fuzzy 

Precise 

Fuzzy 

15 

Fuzzy 

Fuzzy 

Fuzzy 

Probabilistic 

16 

Fuzzy 

Fuzzy 

Fuzzy 

Fuzzy 


FUZZY GOMS MODEL: PILOT STUDY 

The example presented below refers to the generalized GOMS structure 
category #4, where the set of Goals and the set of Operators are precisely defined, 
while the (predicted) Methods used by the subjects as well as specific Selection 
Rules applied to accomplish the editing task were based on fuzzy modeling 
concepts, including application of the linguistic values, fuzzy connectives and 
fuzzy logic, and possibilistic measures of uncertainty. Such model is referred to as 
the Fuzzy GOMS model. 
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Karwowski et al. (1989) reported an experiment performed to validate the 
fuzzy-based GOMS model for text editing task. The experiment was a variation of 
the manuscript editing experiment by Card et al. (1983). The experiment consisted 
of the following steps: 

1. The subject performed a familiar text editing task using a screen 
editor (VI) 

2. The methods by which the subject achieved his goals (word location) 
as well as selection rules were elicited 

3 . It was established that many of the rules had fuzzy components 

4. Several compatibility functions for fuzzy terms used by the subject 
woe derived 

5. The possibility measure was used to predict the methods that the 
subject would use 

6. The selected methods were compared to non-fuzzy predictions and 
actual experimental data 

The subject did not know the file to be edited. The task was performed 
from the subject's own office and desk. The subject was familiar with and 
regularly used the VI screen editor. 

Knowledge elicitation 

The knowledge engineers can use sample runs to infer the rules by which 
the subjects select their preferred methods of editing text. An additional benefit 
from a GOMS perspective would be in structuring knowledge elicitation. For 
example, the expert could be prompted to present the methods and the selection 
rules and respond in the following manner: "IF the condition X exists and the 
condition Y exists, THEN use method Z.” For example, while performing a 
task, the subject could be asked to describe why he chose a particular method: 
[Subject: The word (to be changed) is more than half of a screen down, so / 

will use the control-D method and then return-key to the word."] 

[Knowledge 

Engineer: "How strongly do you feel that it is more than half?"] 

[Subject: "Very strong, say 0.8. "] 

The actual distance to the word was measured directly and found to be, 39 
lines. So the degree of membership of belonging to the more than half class was 
0.8 for 39 lines. By having the subject perform many tasks while verbalizing the 
rules, the methods used, and membership of fuzzy quantifiers can be found. 

Experimental methods for pilot study 

The results of a pilot study reported by Karwowski et at. (1989) are 
discussed here in detail. The subject utilized the following five methods to place 
the cursor on the word(s) to be changed: 1) Control - D: scrolls down one half of 
a screen; 2) Control-F: jumps to the next page; 3) Return Key: moves the 
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cursor to the left side of the page and down one line; 4) Arrow Up or Down: 
moves the cursor directly up or down; and 5) Pattern Search: places the cursor on 
the first occurrence of the pattern. 

The subject verbalized five cursor placement rules and seven fuzzy 
descriptors. The following rules were used: 1) If the word is more than half of a 
screen from the cursor and on the same screen or if the word is more than half of a 
screen from the cursor and across the printed page then use method #1; 2) If the 
word is more than 70 lines and the pattern is not distinct then use method #2; 3) 
If the word is less than half of a screen and on the left half of the page use method 
#3; 4) If the word is less than half of a screen and on the right half of the page 
use method #4; and 5) If the word is distinct and more than 70 lines away use 
method #5. 

An example of the compatibility functions for the "right hand side of the 
screen" descriptor elicited in the experiment is given in Figure 2. The knowledge 
engineer assumed that the subject did not have the perfect cognitive ability to 
divide a screen directly in half, and rather elicited the knowledge as fuzzy 
knowledge. For all descriptors, the membership functions were perceived numbers 
of lines or characters, except the distinct and non-distinct descriptors. The distinct 
and non-distinct descriptors were given as counts of failed pattern recognitions and 
served, basically, to predict the patience of the user. 


RIGHT HAND SIDE OF THE SCREEN 



NUMBER OF CHARACTERS FROM LEFT HAND SIDE 


Figure 2. Fuzzy descriptor for the "right hand side of the screen" 
(after Karwowski et al., 1990). 
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Example of rule selection procedure 

Once all the rules, methods, and corresponding membership functions 
have been elicited, the theory of possibility (Zadeh, 1978) was used to model the 
expert's rule selection process. For this purpose, each of the potential rules was 
assigned a possibility measure equal to the membership value(s) associated with it 
during the elicitation phase of experiment. The possibility measure n(A) was 
defined after Zadeh (1978) as follows: 

ji(A) = Poss {X is A) => sup min {/ A (u), re* (u)}, (2) 

where jc x (u) is the possibility distribution induced by the proposition (X is Z), and 
A is a fuzzy set in the universe U. 

The following sub-task is used to illustrate the process of predicting the 
rule selection based on the linguistic inexactness of expert's actions. Sub-Task: 
Move down 27 lines to a position in column 20. The following rules 
(R) apply: 

Rule #l:Membership value of more than a half of the screen - 0.4 
[The possibility that the rule applies is 0.4. ] 

Rule #2:Membership value of more than 70 lines = 0 
[The possibility that the rule applies is 0]. 

Rule #3:Membership value of less than half of the screen = 0.3, and 
Membership value of left hand side of the line - 0.4 
[The possibility that the rule applies is 0.3 and O.4.] 

Rule #4:Membership value of right half of line = 0.9, and 
Membership value of less than half of the screen = 0.3 
[The possibility that the rule applies is 0.3 and 0.9.] 

Rule #5:Membership value of more than 70 lines = 0 
[The possibility that the rule applies is 0]. 

The possibility measure of the possibility distribution of X that the 
subject would select a given rule form the universe of available rules R was 
defined after Zadeh (1978). In case of the example cited above, the most applicable 
rule was derived based on the possibility measure of (X is Rule #}as follows: 
Poss (X is Rule #}=MAX [{(Rule#l, 0.4)}, {(Rule#2, 0 )}, MIN {(Rule#3, 0.3), 
(Rule#3, 0.4)}, MIN { (Rule#4, 0.3 ), (Rule#4, 0.9)}, {(Rule#5, 0)}]=MAX 
[{(Rule# 1,0.4)}, {(Rule#2,0)},[(Rule#3, 0.3 )}, {(Rule#4,0.3)}, {(Rule#5,0))] 
={ (Rule# 1,0.4)). 

Given the set of five applicable rules (R), the possibility of selecting 
Rule#l as the most applicable one is 0.4. It was predicted based on the 
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possibilistic measure of uncertainty that the subject would use Rule #1, i.e. the 
CONTROL-D method. All fuzzy model predictions in the experiment were 
checked against the selection rule decisions made by the subjects. 

Results of the pilot study 

In the pilot study reported by Karwowski et al. (1989), one model was run 
using the fuzzy GOMS approach to the cursor placement task. Out of seventeen 
decisions, the fuzzy GOMS model predicted 13, or 76% correctly. Another run 
was made by replacing fuzzy quantifiers with the non- fuzzy rules. The non- 
fuzzy GOMS model predicted only 8, or 47% of the cursor placement decisions 
correctly. 


Table 2. Sample #1: cursor placement rules for the pilot study 
(after Karwowski et al., 1989). 


WORD LOCATION 


METHODS 


Number of 

Word’s column 

Method 

Fuzzy 

Non-fuzzy 

lines down 

number 

used 

prediction 

prediction 

8 

15 

3 

3,4 

3 

27 

20 

3 

1 

3 

12 

14 

4 

4 

3 

21 

20 

4 

4 

3 

44 

21 

5 

1 

1 

11 

24 

1 

3 

3 

10 

29 

3 

3 

3 

31 

29 

1 

1 

3 

26 

18 

1 

1 

3 

7 

24 

4 

4 

3 

29 

22 

1 

1 

3 

101 

25 

2 

2 

2 

100 

22 

5 

5 

5 

7 

5 

4 

3 

3 

4 

42 

4 

4 

4 

70 

21 

1 

1 

1 

12 

20 

4 

4 

3 


It was also observed that the use of fuzzy concepts seemed very natural 
within the knowledge elicitation process. It seemed much easier to ask for fuzzy 
memberships in the linguistic terms, than it would be to try and ascertain exact cut- 
offs for selection rules. This observation supports the results of the study by 
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Kochen (1975) who concluded that a higher degree of consistency in subjects 
response was found if they were allowed to give imprecise (verbal) descriptors of 
fuzzy concepts. 

FUZZY INTERACTIONS: MORE RESULTS 

Five subjects, graduate engineering students, participated in the main 
laboratory experiment reported by Karwowski et al. (1990). Subjects were asked to 
perform word placement while explaining what and why they were choosing their 
particular methods and the associated selection rules. If these selection rules 
appeared to have fuzzy components, these components were quantified by asking 
the subject to verbalize a membership value for the applicability of the rule. It 
was noted that fuzziness was based upon the participants cognitive ability to 
measure terms such as "about one half of a page". 

For example, if the rule displayed a fuzzy component, either the paper or 
the screen (depending upon how the subjects referenced the fuzzy term) was pointed 
to, and the subjects were asked questions such as: "From 0 to 100 how much does 
this case belong to the class of FAR?" The resulting value was used to define the 
corresponding membership functions. 

One antecedent, universally identified, was: "If the word to be located is 
distinct, then." The subjects were asked to determine whether the word to be 
located was distinct or not (binary decision). Later, each participant was asked to 
rate the distinctness as a fuzzy number from 0 to 1. A somewhat surprising result 
was that, on the whole, participants were more correct in choosing the fuzzy 
"distinctness" as opposed to the binary, non-fuzzy "distinct" category. Once the 
methods and the selection rules were elicited, the subjects were asked to perform 
similar word placement tasks on a different file. The methods which they used 
were recorded, and this case served to test the validity of both the fuzzy and non- 
fuzzy models. 

For each of the fuzzy components of the text editing task, the uncertainty 
was quantified by presenting the subject with different scenarios. The curser would 
be placed on the screen and pointed to a word asking the degree such a scenario 
belonged to one of the fuzzy sets. An exhaustive collection of points was not 
conducted, but rather only a few points taken and interpolated (graphically) between 
the points. 

The results of the main study are first illustrated using one subject only. 
Subject #2 utilized two methods to place the cursor at the given word. This 
reflected the subject's perception of the task as not an editing task, but rather a 
word location task. The methods used were: 1) Search for the pattern (/xxx would 
search for the next occurrence of pattern xxx), and 2) Search for a near pattern. The 
two rules utilized by the subject were simple: 1) If the word is "distinct" then use 
method # 1 , otherwise, 2) Use method #2. 

The subject was asked to rate whether each word was distinct (for non- 
fuzzy analysis), and then later asked for the "distinctness" or a fuzzy number for 
each word (in essence giving a fuzzy rating). For simplicity, the subject was only 
asked to rate the 20 words to be searched for and not the words located nearby. 
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This was not ideal because another rule was noted (but not verbalized): "If there is 
a 'very distinct' word 'neat' the word to be located, search for that pattern instead." 

Table 3 shows the word number, distinctness ratings, methods actually 
used, and the non-fuzzy model predictions (differentiated by using the concept of 
distinctness to predict the subject's keystrokes). It is obvious that in the case of 
subject #2, the results were not conclusive, and did not imply that fuzziness helps 
in the GOMS modeling. The non-fuzzy model correctly predicted 55% of die 
keystrokes, while the fuzzy model predicted 60% of the keystrokes. This low 
rating may be due to the fact that the rules elicited woe not those used, and that the 
relationship between concept of distinctness and the methods used could depend on 
the distance to the searched word. 


Table 3. Example of results for subject #2 (after Karwowski et al., 1990). 


Word Number 
number of lines 

Distinct Grade of 
(yes/no) distinctness 

Method 

used 

Non-fuzzy 

prediction 

Fuzzy 

prediction 

1 

34 

Y 

0.75 

S 

S* 

S* 

2 

12 

Y 

0.8 

S 

S* 

S* 

3 

52 

Y 

0.75 

SN 

S 

s 

4 

116 

N 

0.6 

SN 

SN* 

s 

5 

8 

N 

0.3 

S 

SN 

SN 

6 

30 

N 

0.35 

S 

SN 

SN 

7 

44 

N 

0.65 

s 

SN 

s* 

8 

118 

Y 

0.4 

s 

S* 

SN 

9 

12 

N 

0.55 

s 

SN 

S* 

10 

54 

N 

0 

SN 

SN* 

SN* 

11 

13 

N 

0 

SN 

SN* 

SN* 

12 

16 

N 

0.35 

s 

SN 

SN 

13 

171 

N 

0 

SN 

SN* 

SN* 

14 

25 

N 

0.35 

s 

SN 

SN 

15 

4 

N 

0 

SN 

SN* 

SN* 

16 

4 

Y 

0.4 

s 

S* 

SN 

17 

38 

N 

0.6 

s 

SN 

S* 

18 

198 

Y 

0.45 

SN 

S 

SN* 

19 

16 

N 

0 

SN 

SN* 

SN* 

20 

14 

Y 

0.8 

s 

S* 

S* 



Rate of correct model predictions 1 1/20 

(55%) 

12/20 

(60%) 


S = Direct pattern search (method #1) 

SN = Search pattern near word (indirect pattern search: method #2) 
* = Correct prediction 
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Model prediction comparison 

Table 4 shows a summary of prediction performance for both models 
and all subjects. Overall, across all subjects and trials, the non-fuzzy GOMS 
model successfully predicted 58.7% of the responses, while the fuzzy GOMS 
model predicted 82.3% of the subjects decisions. The Wilcoxon test showed that 
this difference was highly significant ( [chi-square statistics] X2= 9.95, p < 0.01). 


Table 4. Summary of experimental results for the main study 
(after Karwowski et al., 1990). 


Success rate (correct prediction) 


Subject 

number 

Number 
of trials 

Non-fuzzy GOMS 
prediction rate 

Fuzzy GOMS 
prediction rate 

1 

20 

11 (55.0%) 

12 (60.0%) 

2 

74 

35 (47.0%) 

63 (85.1%) 

3 

26 

19 (73.0%) 

22 (84.6%) 

4 

27 

19 (70.4%) 

21 (76.9%) 

5 

153 

92(60.1%) 

129 (84.3%) 

Total 

300 

176 (58.7%) 

247 (82.3%) 


Several interesting observations were made through this expansion of the 
experimental data. The most important one was that in many cases adding the 
fuzzy functions helped tremendously in clarifying the meaning of rules. 
Specifically, a fuzzy definition of "distinctness" proved to be superior (in many 
cases) to its binary definition. Although the addition of fuzziness to the model 
structure could be seen as a "fine tuning" taking place in the elicitation process, 
this was not always the case (for example see results for subject #2). 

CONCLUSIONS 

Fuzzy methodologies can be very useful in the analysis and design of man- 
machines systems in general, and human-computer interaction systems in 
particular, by allowing to model vague and imprecise relationship between the user 
and computer. In order for this premise to succeed, one must identify the sources 
of fuzziness in the data and communication schemes relevant to the human- 
computer interaction. By incorporating the concept of fuzziness and linguistic 
inexactness based on possibility theory into the model of system performance, 
better performance prediction for human-computer system may be achieved. 

The imprecision-tolerant communication scheme for human-computer interaction 
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tasks should be based on fuzzy-theoretic extension of the GOMS model. In order 
to realize the potential benefits of fuzzy communication scheme, the natural 
fuzziness of the Operators, Methods and Selection Rules of the GOMS model 
should be modeled in order to allow the user to communicate with the computer 
system in a vague but intuitively comfortable way. 

Since fuzziness plays an essential role in human cognition and performance, 
more research is needed to fully explore the potential of this concept in the area of 
human factors. It is believed that the theory of fuzzy sets and systems will allow 
one to account for natural vagueness, nondistributional subjectivity, and 
imprecision of man-machine systems which are too complex or too ill-defined to 
admit the use of conventional methods of analysis. 

A formal treatment of vagueness is an important and necessary step toward more 
realistic handling of imprecision and uncertainty due to human and behavior 
through process at work. It is our view that the theory of fuzzy sets will prove 
successful in narrowing the gap between the world of the precise or "hard" sciences 
and the world of the cognitive or "soft" sciences. Tliis can be achieved by 
providing a mathematical framework in which vague conceptual phenomena where 
fuzzy descriptors, relations, and criteria are dominant (Zimmermann, 198S) can be 
adequately studied and modeled. 
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QUESTIONNAIRES AND FUZZINESS 
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INTRODUCTION 

Questionnaires represent hierarchical processes disjoining the elements of a 
given set by using successive tests or operators [12]. They involve the probabilities 
of the results of the tests, or the probabilities of the modalities of the operators. In the 
case where the tests or operators depend on imprecise factors, such as the accuracy of 
physical measurements or the linguistic description of variables, the questionnaires 
take into account coefficients evaluating the fuzziness of the data. The construction 
of such questionnaires is submitted to several kinds of constraints and requires 
appropriate algorithms. 

When the questionnaire is only characterized by probabilistic elements, its 
average length is generally interesting to minimize in order to improve the efficiency 
of the process it represents, with respect to some basic constraints. The tests or 
operators can be chosen with regard to the quantity of information they process. The 
construction of the most efficient questionnaire is either holistic [11, 12], taking into 
account all the tests or operators which must be used, or it is selective, based on the 
choice of the most significative tests or operators with regard to the purpose of the 
process [2, 13, 14]. 

If fuzzy criteria are involved, the efficiency of the questionnaire 
deals with the specificity of the results it provides. A trade-off must be obtained 
between the preservation of some fuzziness in the tests or operators allowing a 
flexibility in the management of the available data, and the reliability of the results 
obtained through the questionnaire [3, 4]. 



222 


The support of a questionnaire is a finite, directed and valuated graph without 
circuit, where every vertex is connected with one of them, called the root, by at least 
one path or series of edges (exactly one in the case of arborescent questionnaires). 
No edge raids in the root. There exist terminal vertices from which no edge is 
descending. Several systems of valuations can be defined for the edges and the 
vertices, for instance probabilistic valuations, utility values, coefficients of fuzziness. 
The tests or operators are connected to the non-terminal nodes and their possible 
results or modalities are associated with the edges descending from this node. 

The simplest case of a questionnaire is arborescent. Such a model is 
extensively used in various fields and it corresponds to weighted trees. In die 
classical probabilistic framework, where the data are associated with a given 
uncertainty, applications of arborescent questionnaires exist in the study of search 
trees, decision trees, fault trees, species identification, hierarchical classification, 
diagnosis assistance, decision-making, preference elicitation, knowledge acquisition 
for instance. 

When the involved tests or operators are not precisely described, arborescent 
questionnaires must take into account both uncertainty and imprecision and they 
must lead to conclusions which are acceptable in spite of the imprecision. We present 
here several utilizations of questionnaires in a fuzzy framework. 


QUESTIONNAIRES WITH LINGUISTIC VARIABLES 

Let us consider a given set D = {dj, ..., d Q } of elements to identify, for 
instance a set of decisions to make, of diagnosis to identify, of classes to recognize, 
which are supposed to be defined without any ambiguity. We suppose that the 
probability distribution P= {pj, .... p Q } is available, with pj the probability of dj to 
be present in the considered world, for 1 £j £n. 

We also consider a set Q = {qj, ..., q m } of so-called "questions", which 
represent tests or operators. A question qj is a link between a linguistic variable Xj 
defined on a universe Uj and a family of a(i) labels, denoted by q^, ..., qj a ^ , and 
associated with possibility distributions f^, ...» fj a ^\ defined on Uj and lying in [0, 
1] (seeFigurel). 
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redness of the skin redness of the skin 

Example of continuous possibility distributions 



nonnality normality 

Example of discrete possibility distributions 


Figu re l 

Two different types of problems can be regarded, depending on the fact that the 
questions of Q are deterministic or not with regard to the elements of D. 

- either there is a possibilisitic relationship between lists of answers to questions of 
Q and the elements of D, yielding the possibility of d € D to be concerned in a 
studied situation, according to the obtained answers, and the certainty we can have 
in this assertion [7]. We construct a questionnaire by successively choosing 
questions of Q bringing as much information as possible on the elements of D and 
we stop asking new questions when an element of D is sufficiently well identified 
(selective construction). 

- or there is a precise relationship between lists of answers to questions of Q and 
elements of D, and we construct a questionnaire by ordering the questions of Q in 
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such a way that every element of D can be associated with a terminal vertex of the 
questionnaire (holistic construction) [1, 3]. 


SELECTIVE CONSTRUCTION OF QUESTIONNAIRES 

(See annex 1 for technical details about this section). 

U 

In a probabilistic study, the probabilities prob (qj / dj), 1 £ i £ m, 1 £ 
k 4 a(i), 1 £ j £ n, of obtaining every label associated with a question would be 
given for every element of D. In many cases, there is no means of knowing these 
probabilities and the only knowledge we have regarding the simultaneous presence 
of a given label and an element of D is possibilistic. 

Let us suppose given the possibility 7t(d: / q^) that we are in front of the case 
dj of D, for 1 £ j £ n, when we obtain the label qj for question qj, for 1 £ i £ m, 1 
<ik£ a(i). As there is no absolute certainty that this answer implies that dj must be 
identified, we also suppose given the necessity N(dj / qj ) quantifying this certainty. 

We can also suppose given some knowledge about the fact that the element 
dj can be thought of, when an answer different from q^ is obtained to question qj : 
let 7 r(dj / -i q-k) and N(dj / -> q^) denote the possibility and the certainty that dj is 
acceptable when q ^ is not obtained. If these values are not precisely known, they 
will be replaced [10] by the interval [0, 1] to which they belong. 

We fix thresholds s and t in [0, 1], defining the acceptable values [s, 1] and 
[t, 1] for the lowest acceptable possibility and the lowest acceptable certainty of an 
element of D to be satisfying when given labels are obtained for a question. 

The problem we consider is the following : 

- first of all, how to determine the sequence of questions necessary and sufficient to 

identify every element of D as reliably and efficiently as possible, (step 11 

- secondly, how to use this sequence of questions every time we have to recognize 

a particular case under study. (step 21 


Applications of this model can be found in knowledge acquisition, in 
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diagnosis assistance, in species identification for instance. Hie first step corresponds 
to the construction of the sequence of questions providing the best recognition of 
classes on a training set of examples, the second step is associated with the 
identification of the convenient class for an example not belonging to the training 
set. 


Step 1 

It is obvious that the element d: of D will be immediatly recognized if there 
is a question qj yielding an answer qj such that N(dj / q^) = 7 r(dj / q^) = 1. No 
further question will be necessary in this case, but at least one other question must 
be asked in the general case. 

The first question to be asked will be q-, for 1 £ i £ m, processing the most 
efficient information about the elements of D and we propose to evaluate this 
efficiency by means of the average certainty Cer(qp provided by qj on the 
recognition of any element of D 

Then, the first question to be asked will be qj such that Cerfqj) is maximum. 

Now, let us suppose that a sequence S f = (xj, ..., x r ) of questions is not 
sufficient to determine an element dj 0 of D such that its possibility to be present, 
given the answers it provides to questions Xj, ..., x^ is sufficiently high and the 
certainty available on its identification is acceptable (see Figure 2). 

If labels xjW), .... x f ^ r ) are respectively obtained for this sequence S f of 
questions, we evaluate the possibility Pos (d- / Xj^ 1 ), ..., x^W ) that dj Q should 
be identified, and the certainty Nec (dj Q / xj^, ..., x r ^ r ) ) that this identification is 
satisfying. 

As the sequence S f of questions is not sufficient to identify an element of D 
with the list of obtained labels x^l), . . . , XjM 1 ), a new question qj must be asked. 

For an obtained label q^, we evaluate the average certainty Cerfx^W, 
qj k ) provided by these (r+ 1) questions on any element of D. 

We choose the question qj which processes the most efficient information 
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about D, with regard to all its possible labels, or, equivalently, which gives the 
highest absolute certainty C( qj ). 


No further question will be asked when a sequence of labels 
Xj^W is obtained for questions Xj, ..., x f , and there exists an element dj Q of D 
such that Pos(dj Q / Xj^ 1 ), .... x^W ) £ s, and Nec (dj 0 / ..., x^ 1 ) ) £ t. 

Then dj Q will be associated with S r which is called terminal . 


Step 2 : 

For a new given particular situation c Q , an element of D must be identified 

from the answers to the various questions of the questionnaire we have 
constructed. 
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Let S f be a terminal sequence of questions of Q in this questionnaire, to which c Q 
provides answers Xj^O), .... chosen, for every question, in the list of 
available labels. Then, we clearly identify the element of D associated with S f . 

As the labels associated with every question are not precise, we must accept 
that an answer is provided in a way somewhat different from the expression we 
expect in the list of authorized labels. Let us denote by q'j the label obtained as an 

U 

answer to question qj, 1 £ i £ m, more or less different from all the qj , 1 £ k <, a(i), 
and by gj the possibility distribution describing q'j , defined on U- and lying in [0, 
1]. (See Figure 3) 



U 

The compatibility of this answer q'j with one of the labels qj proposed for qj, , with 
l£k£a(i), is measured by the classical possibility and necessity measures of 
adequation [9] respectively denoted by 7T( q^, q'j) and N( qj^, q'j). 


We deduce the possibility TT^dj) that dj is concerned by the particular 
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situation c Q , according to the proximity of its answer with qj , and the certainty of 
this assertion N k (dj). This evaluation will be performed for the labels qj k such that 
7 t( qj k ; q'j) i. s and N( qj k ; q'j) i t It is then possible to have several sequences of 
questions to use, i.e. several pathes of the questionnaire to follow before the 
recognition of a particular element of D. 

More generally, let us consider again a terminal sequence S f of questions 
leading to the identification of dj Q in the questionnaire. Because of the differences 
which may exist between the expected answers to these questions, and the labels 
obtained from the particular case c 0 , the possibility and certainty of dj Q will be the 
following : 

Pos (dj Q / Xj^ 1 ), • • •, *r k(r) ) = miPi<i<f ^(djo ). 

Nec(dj 0 / x 1 k ^ 1 ).—.x r k(,r) ) ss njax 1 ^N k(l) (dj 0 ). 

The element dj 0 will be definitely identified for the situation c Q , by means 
of the sequence of questions if there exists labels Xj k ^), ...» Xj k ^ yielding Pos 

(dj 0 / x^ 1 ^, .... x r k « ) * s and Nec (dj Q /xj k ^ x r k ^ £ 1. 

HOLISTIC CONSTRUCTION OF QUESTIONNAIRES 

{ See annex 2 for technical details abou t this section - ). 

Let us suppose that we want to use all the tests or operators of Q, and we 
have to order them in such a way that the questionnaire we construct associates an 
element of D with each terminal node. The questionnaire could be arborescent or 
not. We suppose that Q and D are compatible, which means that such a construction 
is possible. 

The problem we consider is the choice of the questions providing the most 
efficient questionnaire with regard to the recognition to make. Its quality can be 
evaluated [3] with respect to the fuzziness which is involved in the characterizations 
deduced from the fuzzy tests or operators, and improved, when several 
constructions of questionnaires are possible, by an appropriate choice of a the order 
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of some questions when possible. Applications can be found in search trees, in 
species identification, for instance. 

Several aspects of such a choice can be proposed [3, 4, 6] and we propose 
one method hereunder. 

Let us suppose that the labels associated with the questions qj of Q are 
conveniently defined in such a way that they determine a fuzzy partition Jlj of the 
universe Uj on which the concerned linguistic variable Xj is defined. The classes of 
this fuzzy partition are fuzzy subsets of U- defined by membership functions equal 

to fjk, l^k^a(i), in every point of Uj . We suppose given the probability 
distribution P- of the variable X-, for the studied population. 

The problem we consider is the identification of a crisp (non-fuzzy) partition 
of Up able to represent the information contained in IJi* We may think of several 

applications of this problem : in knowledge acquisition, if the training set deals 
with crisp data and then non-fuzzy tests or operators, and the new examples are 
described by means of fuzzy questions ; in decision-making, when a crisp decision 
must be taken from fuzzy test s or operators or from the answers provided by the 
inquired personto a crisp question q- by indicating preference grades for the 

elements q^ which are proposed to ho**, in preference elicitation, when the inquirer 
makes a choice between two fuzzy questions about the same variable. 

j|C 

For a given threshold r in [0, 1], we associate with IJi a crisp partition of 

level r, by defining crisp classes as q^* = { u / fj^(u) <z r } , l^k^a(i). Obviously, 

such a crisp partition does not exist for any value of r and some thresholds 
correspond to several possible crisp partitions. We suppose that the tests or 
operators are defined in such a way that there always exist a value r providing a 
crisp partition. We can consider the average weight of each fuzzy label by 

introducing it s r-probability Pj r ( q^) as the average value of its associated 

possibility distribution, for the values at least equal to r. 

This generalization of the concept of probability to a fuzzy subset of the 
universe allows to measure the fuzzy information Ij CHj) processed by XIj for the 

threshold r with respect to the crisp partition JJj . We use this tool as a measure of 

a|c 

the proximity between and Ili • 

Let us consider the case where we are given a set of fuzzy operators Q and 
we look for the crisp partition associated with each of them, loosing as little 
information as possible when passing from fuzzy descriptions to crisp descriptions. 
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For every qj associated with the fuzzy partition [jfi of Up we choose the crisp 
partition IXj* such that the fuzzy information processed by JIj for the 

threshold r with respect to 15}* > is ma ximum . 

If several tests or operators are available for the same linguistic variable X- 
on Uj, the most interesting is the one processing the greatest absolute fuzzy 

information with regard to all the possible crisp partitions which could be 
associated with it. 
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Annex 1 : 

Possibility and necessity coefficients associated with every element dj of D, 
when the label q: k is obtained as an answer to the test or the operator qj of Q, are 

1r lr ^ 

respectively denoted by 7 t(dj / q^) and N(dj / q-). They belong to [0, 1] and they 
are such that N(dj / qj k ) £ 7r(dj / qj k ), with N(dj / q^ k ) = 0 if 7t(dj / q k ) < l and 7r(dj / 
qj k ) — 1 if N(dj / qj k ) * 0. 

The average certainty Cerfqj) provided by a single test or operator q- on the 
recognition of any element of D is defined as follows : 

= E 14k*a(i) E l*j*n N < d j / Pj • W 

The possibility Pos fdj Q / x^W x^ f ) 1 that the element dj Q of D must 

be identified, and the certainty Nec (dj Q / Xj k ^, ) that this identification 

is satisfying, when labels .... are obtained as answers to tests or 

operators Xj, Xj., are evaluated by means of the following coefficients : 

Pbs (d^ / xJW xf ® ) « min l <;<r 7^ / (2) 

Nec (^/x 1 k ( 1 \...,x r k W) = max 1 ^Jt(dj 0 /x i k ^ 1 )). (3) 

We define as follows the average certainty Cer(x j k ^\ . . . , x s k ^, provided 
about the recognition of any element of D, by a sequence of labels Xj k ^\ .... 
x s k ( s > obtained as answers to tests or operators Xj, ...,x g ofQ: 

Cerfx^ 1 ), ••MX s k ^ s b = I 1 ^K Q A i j k Nec(dj/x 1 k(1 ) Xg^pj, (4) 

with Ajj k = 1 if Pos ( dj / Xj x g k ^) £ s, and 0 otherwise. 

The absolute certainty of a test or an operator q- of Q, after the sequence of 
labels Xj k ^\ x^ 1 *) is obtained as answers to tests or operators Xj, ...» Xg is 
defined as follows : 

C(<Ji)-(lA©)I ljacaia) Cer(x 1 k ( 1 ),...,x k ( r ),q i k ) (5) 

Possibility measure of the adequation of any answer q'j with a given label qj k , 
for a question (test or operator) qj of Q: 

n( q'p = sup { u in Ui } mm ( fftu). gj(») ). 1 * k * a(i), (6) 

Necessity measure of this adequation ; 



N( q^, q'j) = inf { u in Ui } n»x ( 1- f^u). gj(u) ), 1 £ k S a(i). 
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(7) 

For the particular situation c Q , the possibility 7 t^(dj) that d: is concerned 
according to the proximity of the obtained answer q'j with q- , and the certainty of 
this assertion N k (dj), will be evaluated by the following coefficients [10, IS] : 
7l k (dj)=inax[niin{7r( dj / qj k ),7l( q i k ;q , i )},min{7r( djj/— , q^),l-N( q^, q’j)}], (8) 

N^(dj)=min[max (N dj / q^; q'j)},max{N( dj /-, q^), N( q^; q'j)}]. (9) 

As indicated in [10], the values can be replaced by the interval to which they 
belong in the case where they are not precisely known. 


Annex 2 

A fuzzv partition of the universe Uj on which the concerned linguistic 
variable Xj is defined satisfies : 

Il <lr<a( i) fj k (u) = 1 .for every point u in Uj, 
and X {u in Ui } fj^(u) >0, for every k» l£k£a(i). 

The r-probabilitv Pj f ( qj^) of a fuzzy label qj^ with regard to the crisp class 
qjk* is defined by: 

If ( <lfr = X {u in qj 1 ^ } ^u) Pj(u). 

The fuzzv information Ij 1 * (IIj) processed by a fuzzy partition Ilj of Uj for the 
threshold r with respect to the crisp partition Jli ls defined as follows : 

ifOip = X l<3c£a(i) U Pf( qf)) / [ X l^a(i) Pf < qi k ) 1, 

with the function L(x) = -x log(x). 

Properties of this fuzzy information lead to its mazimization in order to have 
the best compatibility between a fuzzy partition and any possible associated crisp 
partition for a given threshold r. 

The absolute fuzzy information processed by a fuzzy partition wit h 
regard to all the possible crisp partitions which could be associated with it equals : 
max { Ij^OIj ) / Ili* associated with Jlj }. 
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ABSTRACT 

This tutorial paper has been written for biologists, physicians or beginners 
in fuzzy sets theory and applications. This field is introduced in the framework of 
medical diagnosis problems. The paper describes and illustrates with practical 
examples, a general methodology of special interest in the processing of borderline 
cases, that allows a graded assignment of diagnoses to patients. A pattern of medical 
knowledge consists of a tableau with linguistic entries or of fuzzy propositions. 
Relationships between symptoms and diagnoses are interpreted as labels of fuzzy 
sets. It is shown how possibility measures (soft matching) can be used and combined 
to derive diagnoses after measurements on collected data. 

The concepts and methods are illustrated in a biomedical application on 
inflammatory protein variations. In the case of poor diagnostic classifications, it is 
introduced appropriate ponderations, acting on the characterizations of proteins, in 
order to decrease their relative influence. As a consequence, when pattern matching is 
achieved, the final ranking of inflammatory syndromes assigned to a given patient 
might change to better fit the actual classification. Defuzzification of results (i.e. 
diagnostic groups assigned to patients) is performed as a non fuzzy sets partition 
issued from a "separating power", and not as the center of gravity method commonly 
employed in fuzzy control. 

It is then introduced a model of fuzzy connectionist expert system, in which 
an artificial neural network is designed to build the knowledge base of an expert 
system, from training examples (this model can also be used for specifications of 
rules in fuzzy logic control). Two types of weights are associated with the 
connections : primary linguistic weights, interpreted as labels of fuzzy sets, and 
secondary numerical weights. Cell activation is computed through MIN-MAX fuzzy 
equations of the weights. Learning consists in finding the (numerical) weights and 
the network topology. This feedforward network is described and illustrated in the 
same biomedical domain as in the first pari. 


♦Address for correspondence 
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Keywords : Fuzzy Logic, Linguistic Model, Fuzzy Propositions, 
Medical Knowledge Representation, Medical Diagnosis, Soft Matching, 
Relative Importance, Defuzzification, Separating Power, Artificial Neural 
Networks, Fuzzy Connectionist Expert Systems, Linguistic Weights. 

INTRODUCTION 

In many situations, physicians use subjective or intuitive judgments. They 
cannot always logically, or in simple terms, explain how they derive conclusions, 
because of the complex mental processes inherent to the nature of the cases to be 
diagnosed, or to the difficulty of recalling their years of training and experience. 

Interpretation of biological analyses suffers from some arbitrariness, 
particularly at the boundaries of the quantities that are measured, or evaluated. It is 
customary to use symbols like +++, ++, +, N, - -, or, Ttt, TT, t, N, 4-, 

4-1, 111 to denote variations (N 1 stands for Normal'). In general, limits of values 
that characterize abnormalities, or normality, define numerical intervals that are used 
to describe standards in variations. First of all, normal or non pathological states, 
have to be determined. They constitute the reference to which abnormalities are 
specified. 

Biologists are familiar with normal variation ranges that are a prerequisite to 
a proper interpretation of all laboratory tests. Notions of statistical normality are 
usually derived from frequency distributions, not always confined to Gaussian 
distributions. But, depending on the measurement procedures of a given laboratory, 
on the epidemiologist, the biologist or the clinician who manipulates and interprets 
measurements, but also on the nature of the populations under study, and on 
conditions of physiological (biological) normality, one has to commonly rely on 
fiducial limitts (see for example [1,2] for discussions on normality). 

The main drawback in working with intervals to represent normality, or 
ranges of variations for abnormalities, is the weak reliability on thresholds. 
Moreover, such boundaries are more or less physician dependent in practice. For 
example [3], "the normal base-line value for a given individual's lactic acid 
dehydrogenase may be at the extreme low point of the normal range for the general 
populations. Thus he (the physician) could develop an elevation due to a disease 
process that is significant and still within the normal range of the population." Still 
in [3], under a table defining the range of normal values for blood chemistry, one 
may read : "these ranges are a guide to the normal concentrations of blood 
constituents. For accurate interpretations, always refer to normal values established 
by individual laboratories, since individual differences in procedures may affect the 
actual ranges." 

A problem that is often posed lies in the ill-definition and in the treatment 
of the boundaries of the intervals. To cope with borderline cases, fuzzy set theory 
provides very natural and appropriate tools. So it is hoe assumed that imprecision in 
the description of variations is of a fuzzy type and terms like "Normal, Slightly 
Decreased, Very Increased, etc.," will be treated as labels of fuzzy sets in (possibly 
different) universes of discourse. These fuzzy sets represent linguistic intervals, and 
around cutoff boundaries, very close points will not be totally accepted or rejected 
like in yes-or-no procedures, according to their position with respect to the frontier. 

A coding with t's or i's is sometimes too restrictive : it is not always 
possible to choose between t and TT for example, and in some patterns one may find 
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"from T to TT." A scale with degrees ranging from 0 to 1 is very convenient. Note 
that it is not needed to set up precise values in [0,1] : in interpreting patterns, it is 
sufficient to have a rough idea of the curve expressing the compatibility between 
measurements and concepts. 

It will now be described a general methodology, illustrated with an 
application, of special interest in the processing of borderline cases, and which offers 
the physician, practical assistance in obtaining the same results in the same abnormal 
profiles. 


PATTERN (MEDICAL KNOWLEDGE) 

In this paper, a pattern of Medical Knowledge consists of a tableau with 
linguistic entries. These linguistic associations are supposed to be given by experts, 
having in mind that different expats may provide somehow different characterisations 
for a same pattern. 

This Medical Knowledge can be interpreted in terms of fuzzy propositions, 
like "Temperature is Slightly_Increased," i.e. of the form "S is F," where S is a 
variable (referred to as the name of a Sign, of a Symptom, or generally of an 
Attribute) taking values in a universe of discourse U, and F is a fuzzy subset of U. 
The tableau expresses relationships between attributes (S) such as temperature, 

plasma lipids, arterial pressure, serum proteins, etc. , and diagnoses (A) or groups, 
types, syndromes, diseases, etc. The linguistic entries are assumed to be labels of 
fiizzy sets (F), or more specifically, fuzzy intervals. Note that the tom "diagnosis" is 
more or less arbitrary, it is a convenient way to summarize or synthetize 
information. In decision processes, symptoms can be viewed as diagnoses and vice- 
versa. Characterizations of diagnoses appear in the rows of a tableau as shown in 
fig. 1. 


ATTRIBUTES 


Sj . . . Sj 


CO 

pq 

CO 

i 



Fig. 1 - Tableau with linguistic entries represented by fuzzy sets. 

In this tableau, Sj (i=l,n) is the name of a variable (Sign, Symptom, or Attribute) 
taking values in a universe of discourse Uj, and Fj is a fuzzy subset of Uj For 
example, in a typical serum protein pattern [3], one may find the tableau of fig. 2. 
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SERUM PROTEINS 


I 


Cirrhosis 


O 

u 


Toto! Albumin Globulin Globulin Globulin 
protein 


Decrease Decrease 


Frequent 

Decrease 


Slight _ 

Increase lnclKS! 


Fig. 2 - Part of a serum protein pattern. 


The Medical Knowledge represented by a generic Diagnosis A in the tableau of 
fig. 1 is interpreted as conjunctions (ANDs) of elementary propositions : 

A IE Pi MR ... MR Pi ... MR P„. 

where for i=l,n, Pj takes the form "Sj is Fj." For example (see fig. 2) : 

Cirrhosis JE Total proteins (S i) are Decreased (Fi) 

AND ... AND Ot-Globulins (Sj) are Frequently Decreased (Fj) 

AND ... AND y-Globulins (S n ) are Increased (F n ). 

Here is another example [4], in the framework of inflammatory protein variations 
Vasculitis IF C3-Complement Fraction is Decreased or Normal 
AND Alpha-l-Antitrypsine is Decreased or Normal 
AND Orosomucoid is Increased 
AND Haptoglobin is Very Increased 
AND C-Reactive Protein is Very Increased. 

In the characterisation of Vasculitis, one has for example "Haptoglobin is Very 
Increased," where "Very Increased" is the label of a fuzzy set "VERY INCREASED", 
depicted in fig.3. 



The information contained in "Haptoglobin is Very Increased" does not provide a 
precise characterisation of the numerical values to be assigned to a variable named 
"Haptoglobin," but it indicates a soft constraint on its possible values. In the pattern 
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of Medical Knowledge , the fuzzy sets are fuzzy intervals that extend the definition of 
usual (crisp) intervals. Fuzzy intervals are here of three types, they fuzzify crisp 
intervals and they mean "fuzzily greater (or smaller)” than a given value a, or "fuzzily 
between" two values b and c (for example fuzzy intervals representing NORMAL 
ranges are usually of this last type). Values like a, b, c, have a grade of membership 
equal to 0.3. In particular, a fuzzy [b,c]-type interval can reduce to a fuzzy number D, 
meaning "around a value d" (see fig.4). In this case, the bandwidth is the separation 
between the two values having a 0.S grade of membership, it is a convenient 
fuzziness indicator of the fuzzy number D. 




D 



bandwidth 




numerical 

values 


Fig. 4 - Fuzzy number D, meaning "around d." 


It is very important for these membership functions to be easily modifiable during 
the training phase, for their evaluation. For simplicity, their shapes have been chosen 
here, as trapezoidal or triangular ones. Practically, it is not very important to set, for 
example, 0.7 or 0.75 as grades of membership when the curves are empirically 
designed. What mostly matters, is the monotonicity of the function and the position 
of strategic values, i.e. values with grades of membership equal to 0, 0.S or to 1. 
Usually, for each type of laboratory analysis, the biologist determines for a specific 
purpose or, more generally refers to a variation range in which should fall the normal 
quantitative measurements. He/she has a rough idea of the limits for abnormalities, 
having in mind more or less well-defined intervals. To determine a patient’s 
condition, it is then sufficient to check in which interval the measured value falls. If 
we consider a non fuzzy proposition of the form : "Sj is a number in the interval 
[2,5]," we mean that any number in the interval [2,5] is a possible value to be 
assigned to the variable Sj and it is not possible for a number outside this interval to 
be assigned to Sj. In other words, for uj in the universe of discourse Uj : 

Possibility [Sj = Uj) = 1 for2<uj<5 

= 0 for uj < 2 or uj > 5. 

Returning now to the fuzzy case, the proposition "Haptoglobin is Very Increased" 
(i.e. of the form "Sj is Fj") means that : 

Possibility [Haptoglobin = Uj] = |4vERY_INCREASED( u i) 
or Possibility [Sj = Uj) = (Ip.(ui). 



240 


ASSIGNMENT OF DIAGNOSES TO PATIENTS 

A given patient will be assigned each diagnosis, a grade between 0 and 1. In 
typical cases, one diagnosis will have a grade equal (or close) to 0 and all the other 
diagnoses will have a grade equal (or close) to 1. The interesting cases will be the 
intrinsically fuzzy ones, i.e. several diagnoses assigned to a patient, with grades 
between 0 and 1. 

Let us consider a diagnosis A, characterised by "(Sj is Fj) AND ... AND 
(Sj is Fj) AND ... AND (S n is F n )". The attributes Sj, .... Sj, .... S n have to be 
measured on the patient, yielding the values : 

Sj (patient) = dj in Uj, .... Sj(patient) = dj in Uj, .... S n (patient) = d n in U n . 
Then, Possibility (S j(patient) = dj, .... S n (patient) = d n , GIVEN "(Si is Fi) AND 

... AND (S n is F„)"} = MIN(|Xpj(di), .... |Ip n (d n )), where the MIN operator usually 
translates the conjunction AND. Finally, such minimum of the above numbers 

provides a grade of compatibility of the patient's condition, for diagnosis A. The 
same operations are performed for all diagnoses, yielding a ranking in diagnoses for 
the patient. 

In fact, the measured data are often fuzzy in at least two aspects : 

i) imprecision in measurements, 

ii) interpretation of the values, 

so that it is natural to transform each measured (numerical) value into a fuzzy number 
(like in fig. 4), e.g. "Sj(patient) = dj" is transformed into "Sj(patient) is Dp" The 
patient's condition is now expressed as a conjunction of fuzzy propositions involving 
fuzzy numbers, so that now, one has the following. 

Possibility (Si(patient) is Di AND ... AND S n (patient) is D n , GIVEN "(Si is Fi) 
AND ... AND (S n is F n )"} = MD4(1C(Fj,Di), .... 7C(F n ,D n )), where for i = l,n, 

JC(Fj,Dj) = SUP(Fj n Di), 

i.e. Vuj € Uj, 1C(Fj,Dj) = SUPjj. MIN[|Ip.(uj),Jl.j).(uj)]. 

7t(Fj,Dj) = Possibility (Dj GIVEN Fj) is called a possibility measure [5]. It is 
illustrated in fig.S, where its numerical value indicates a weak compatibility of 
"around dj" with the fuzzy interval representing "Very Increased.” 



Fig. 5 - Possibility measure of Dj with respect to Fj. 
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Finally, the patients are assigned a ranking in all diagnostic profiles, by 
means of grades lying between 0 and 1. For each patient, the set of all diagnoses, 
associated with their grades of assignment derived from the possibility measures (they 

are numbers in the interval [0,1]), can be considered as a discrete fuzzy set, D. For 

each diagnosis (A), one has for example pp(A) = MIN[n(Fj,Dj), ..., n(F n ,D„)]. D 

wil be defuzzified, as shown in the sequel. 

RELATIVE WEIGHTING 

Practically, some attributes might be less important than others in the 
characterization of a diagnosis. For a given diagnosis, relative importance among 
attributes can be translated by means of weights (a, p, y ...) ranging in [0,1]. A 
value "0" weight assigned to an attribute means that this attribute is not important at 
all in the evaluation of the diagnosis and hence it can be deleted, whereas a value ”1” 
weight does not modify the importance of the protein. Intermediate grades of 
importance can be tuned by adjusting values of weights within the unit interval. 

In the pattern, fuzzy propositions ("S is F," in the generic form) 
characterizing a given group, appear as conjunctions (ANDs). Assignement of a 
weight a to take into account the relative importance of protein variations, can 
assume the following form [6,7], for F fuzzy set in a universe of discourse U : 

F® = MAX (1-a, F), 

i.e. Vxe U.ppa (x) = MAX[(l-a), Pp (x)]. 

Generally, a t-conorm could replace the MAX operator in the above formula [8], 
Limit cases have the following meanings. 

a = 0 : Vx g U, p ^ q(x) = 1, i.e. F° is neutral for conjunctions and therefore, it can 
be deleted, 

a = 1 : V x g U, p j(x) = p (x), i.e. the weight has no effect. 

F F 

In the case of Vasculitis, the following weights have been assigned, yielding 
the modified rule: 

Vasculitis IE C3-Complement Fraction is (Decreased or Normal) 0 - 1 
AND Alpha- 1 - Antitrypsine is (Decreased or Normal) 1 - 0 
AND Orosomucoid is (Increased) 0 - 8 
AND Haptoglobin is (Very Increased) 0 - 3 
AND C-Reactive Protein is (Very Increased) 0 - 8 . 

Note that C3-Complement Fraction could have been neglected (weight close to 0) and 
that no weight might be assigned to "Decreased or Normal" in Alpha-1- 
Antitrypsine (weight equal to 1, i.e. no effect of the weight). For example, the 
modified fuzzy variations of Haptoglobin (with weight 0.3 to "Very Increased"), and 
the corresponding modified fuzzy measure are presented in figure 6. 
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Fig. 6 - Possibility measure of Dj with respect to a weighted Fj. 


DEFUZZIFICATION 

If the patients are to be assigned non fuzzy diagnoses, we must defuzzify the 

fuzzy set D of diagnoses, that has been evaluated following a MIN aggregation. For 
this purpose, one may use the concept of separating power [9], which is different 
from the center of gravity method commonly employed in fuzzy control. The 

separating power s(D) allows to evaluate to which extent a fuzzy set, like D, of a 
universe of discourse U (U is here the set of the given diagnoses under study), 
separates optimally U into a non fuzzy partition (A, A"), where A' is the complement 
set of A. The set A is defined as follows : 

s(!D) = D * A = sup (D*B such thatU^B, B *0 ), in which 
D*B= | cardCDg) /card(B) - cardCD B ,)/card(B') I , 
where Dg denotes the restriction of D to B, Card(B) is the cardinality of B, and 
card(Dg) is the fuzzy cardinality of Dg ; for example, card(Dg) 

Applying the separating power to the fuzzy set D, it is derived the optimal partition 

((A,A') above) to D . A is finally the (non fuzzy) set of diagnoses assigned to 
patients. 

APPLICATION TO INFLAMMATORY PROTEIN 
VARIATIONS 

This application is reported from [4], The following five proteins, involved 
in biological inflammatory reactions, have been chosen. 

- C3 (C3-Complement Fraction) 

- A1AT (Alpha-l-Antitrypsine) 

- Om (Orosomucoid) 

- Hpt (Haptoglobin) 

- CRP (C-Reactive Protein). 

Hie Protein-Biological_Inflammatory_Syndrome (P.B.I.S.) pattern contains eleven 
groups : 


- Normal condition 
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- Eight Biological Inflammatory Syndromes : 

. Bacterial Infections 
. Viral Infections 
. Vasculitis 
. Nephrotic syndromes 
. Acute Glomerular Nephritis 
. Intravascular Hemolysis with inflammation 
. Collagen Diseases non Lupus and without infection 
. Lupus 

- Intravascular Hemolysis without inflammation 

- Glomerular Renal Insufficiency without inflammation 

The protein variations can be easily interpreted in linguistic terms by physicians, so 
that the P.B.I.S. pattern is well adapted to a fuzzy sets representation. The fuzzy 
propositions in this pattern have been interpreted in a linguistic tableau form (one of 
its rows is reproduced in Table 7). 



Table 7 - Linguistic characterisation of Vasculitis in the P.B.I.S. 

pattern. 

The fuzzy sets corresponding to this linguistic pattern have been established for each 
entry of die tableau (one of its rows is in Table 8). 

PROTEINS 

q C3 A1AT Om Hpt CRP 

Z 
>* 

03 ... 

J 

2 ? Vasculitis 

j ... 

O 

hM 

ca 

Table 8 - Fuzzy sets characterisation of Vasculitis in the P.B.I.S. pattern. 

In this study, fuzzy numbers issued from measurements over patients have 
been compared with the corresponding fuzzy sets in the P.B.I.S. pattern, by means of 
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three measures or indexes hereafter defined : possibility measure (it), necessity 
measure (V), truth-possibility index (p). 

For each protein (S), let F be a fuzzy set characterizing S in a diagnostic group, and 
let D be the fuzzy number issued from the seric level of S, measured ova* a patient, 

i) Possibility measure [5]. By definition, Jt(FJ)) = Sup (FOD). 

ii) Necessity measure [10]. By definition, V(FJD) = 1 - Jt(F,D), where F 
denotes the fuzzy complement of F, i.e. F = 1 - F. Note that V(F,D) = 1 - 
Sup (FOD) = Inf (FUD'). 

iii) Truth-possibility index [11,12]. By definition, p(FJ)) = TC(To.^i). where 
To and T} are related to truth-qualification [5], according to the semantic entailment : 

[(S is F) is Tj] — » X is D -4 [(S is F) is Tq]. 

With the special case of fuzzy sets in this study, one simply shows that 
p(F,D) = |Xjj(d), where D means "around d". Moreover, one can show that the 
following ranking holds [13] (see figure 9 for an illustration) : 

V <, p < 7C, 

so that these indexes can be chosen according to optimistic or pessimistic 
considerations. 





d 

Fig. 9 - Compatibility measures or indexes. 


For each of the eleven groups, comparison of a patient's condition with the 
pattern yields five (one for each protein) triples of numbers (Vj, Pj, Jtj), i = 1,5, 
which are aggregated by means of the MIN operator, expressing conjunctions : 
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(V, p, K) = (MINj Vj, MINj Pj, MINj TCj). Finally for each patient, one has three 
different rankings of diagnoses derived from V, p, 7t. 

ILLUSTRATIVE EXAMPLE 

This patient case we report from [4], has been medically diagnosed as 
Vasculitis. The protein profile of this patient is given as follows. 



C3 

A1AT 

Om 

Hpt 

CRP 

raw data (g/1) 

1.50 

2.26 

1.75 

10.0 

0.060 

normalized data 

1.85 

1.00 

1.99 

5.59 

10.0 


For simplicity, we only present the matching results from the possibility measure 
(the TCj's) and the corresponding crisp partition (A,A') that has been found to be 

associated with the fuzzy set of diagnostic groups D. 

A = {Collagen Diseases) (minj Ttj = 0.43) s(l))= 0.40. 

A' consists of all the remaining diagnostic groups. Vasculitis does not appear here for 
one of the possibility measures, Jtj = 7t(Vasculitis,C3), is nearly equal to zero. 

Hence, the MIN operator acting on the TCj's produces a value practically equal to zero, 

whatever values are computed from the other possibility measures associated with 
Vasculitis. In fact, for Vasculitis, the possibility measure results are as follows. 

C3 A1AT Om Hpt CRP minjjij 

jtj’s 0.04 1 0.82 1 1 0.04 

The four proteins (A1AT, Om, Hpt and CRP) have a high grade of matching and 
Vasculitis is rejected because of the only mismatch due to C3. But as already pointed 
out, in the case of Vasculitis, C3 can be nearly neglected (weight equal to 0.1). 
Hence, in a weighted process, one will derive : 

A = {Vasculitis) (minj Jtj = 0.85) s(D)= 0.78. 

The right diagnostic group of Vasculitis appears now, and it is computed with a 
better separating power (0.78) than in the case of a non-weighted process (0.40). 

We recall now die weights of importance associated with the five proteins in 
the characterisation of Vasculitis. For this patient's case, we also give the matching 
results in the non-weighted process, followed by the ones of the weighted process, 
using only the tcj's. 



C3 

A1AT 

Om 

Hpt 

CRP 

minj Jij 

Weights (Vasculitis) 

0.1 

1.0 

0.8 

0.3 

0.8 

Ttj's (non weighting) 

0.04 

1 

0.82 

1 

1 

0.04 

jtj's (weighting) 

0.9 

1 

0.85 

1 

1 

0.85 


In an automatic classification process, the aggregation we have presented for 
Vasculitis (A), has to be performed for all of the eleven diagnostic groups, yielding 

for each patient a fuzzy set (D) of diagnostic groups that can be defuzzified by means 
of the separating power. 
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FUZZY LOGIC AND ARTIFICIAL NEURAL NETWORKS 

It is now introduced a model of fuzzy connectionist expert system, in which 
an artificial neural network is designed to build the knowledge base of an expert 
system, from training examples (this model can also be used for specifications of 
rules in fuzzy logic control). 

Expert systems have shown some weaknesses, for example in the process of 
eliciting knowledge from experts, in learning capabilities or in producing poor results 
at the limits of the system's domain of expertise. Neural networks are offering 
noticeable contributions to expert systems such as : training by example, dynamical 
adjustment of changes in the environment, ability to generalize, tolerance to noise, 
graceful degradation at the border of the domain of expertise, ability to discover new 
relations between variables. Fuzzy logic, supporting interpolative reasoning [14], is 
playing a key role in human cognitive systems, it lies at the base of pattern 
classification, qualitative reasoning, analogical reasoning, case-based reasoning, 
neural modeling, system identification and related fields. The standards of accuracy 
and precision prevailing in traditional computers are presently questioned or discarded, 
especially while narrowing the gap between human reasoning and machine reasoning. 
In the context of approximate reasoning, expert systems and fuzzy logic control on 
one side, and artificial neural networks, on the other side, share common features and 
techniques [IS]. Connectionist networks (or artificial neural networks) tools are now 
used in learning control problems like the cart-pole balancing system [16-18]. 
Combination of fuzzy logic with neural networks theory is enhancing the capability 
of intelligent systems to learn from experience and adapt to changes in an 
environment with qualitative, imprecise, uncertain or incomplete information. 

FUZZY CONNECTIONIST EXPERT SYSTEMS 

Fuzzy logic has been used in conjunction with artificial neural networks in a 
variety of recent papers [17-31]. In the spirit of S.I. Gallant's model [19] of 
connectionist expert system (CES), we proposed in [31] an expert classification 
system in which a connectionist model is used to extract or to tune the knowledge 
from a training set of examples. An important feature of this model is its fuzzy 
nature with an intrinsic treatment of fuzziness. Nevertheless, unlike in the CES 
model, fuzzy sets are not considered from their crisp representations. 

Inputs to the neural system are weighted, but we assume that weights are of 
two types : primary weights, in general followed by secondary weights. Primary 
weights express the main information on knowledge. They have a linguistic form and 
they are interpreted as labels of fuzzy sets, meaning for example : Increased, 
Decreased, Very-increased, Normal, etc., like in the application we just described. 
Depending on applications, these fuzzy sets are defined over universes of discourses 
related to the nature of the input cells or, like in fuzzy control, they can be members 
of a given partition of the interval [-1,+1], with triangular shaped membership 
functions (fig. 10) typically meaning "Negative Large (NL), Negative Medium (NM), 
Negative Small (NS), Approximately Zero (ZR), Positive Small (PS), Positive 
Medium (PM), Positive Large (PL)", or more simply having the only three 
linguistic values "Decreased, Normal, Increased". Secondly weights are numbers in 
[0,1], they reflect the grade of weakness of the corresponding connection (the weaker 
the connection, the closer to 1 the weight) and they do not necessarily act on 
connections but when they do so, they follow a primary weight they are combined 
with. 
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The neuro fuzzy system is a feedforward network with no thresholds : fuzzy 
sets avoid the use of thresholds, by considering graded transitions from one state to 
the other. There are no directed cycles, no feedback, one iteration is sufficient for 
inferencing. The training phase is not performed from methods involving weighted 
sums of inputs, but from intrinsically fuzzy equations, using MIN and MAX 
operators. Ibis phase consists in finding the numerical weights from examples. It is 
not asked to find the membership functions of the primary weights in the general 
case, for any universe of discourse. It is assumed that a human expert has a rough 
idea of the shapes, the task is to tune the curves according to the information 
provided from input-output examples : this is a general remark to keep in mind when 
designing models of fuzzy connectionist expert systems (FCES). 

Learning now mainly consists in finding the numerical secondary weights 
and the network topology : numerical weights close to "1" will indicate an absence 
of the corresponding primary weight, whereas numerical weights close to "0" will 
not influence at all the corresponding primary weight. Then the primary linguistic 
weights might be adjusted, when appropriate, by moving the slopes of the curves in 
the intrinsically fuzzy zone (grades of membership different from 0 and from 1). 

The neuro fuzzy network consists of connections between input cells (Sj), 

output cells (Aj), and possible hidden cells (Hy). Primary weights (wy) are linguistic 
labels of fuzzy sets, characterizing the variations of the input cells ("Sj is wij") in 
relation with the output cells (see fig. 11). 


(linguistic) 



Input Cell Output Cell 

Fig. 11 - Connection with an only primary (linguistic) weight 

We assume, depending on the context, that wij denotes indifferently (as no 
confusion arises) a linguistic weight or the associated fuzzy set. Secondary weights 
(by) are numbers in the unit interval. In the network, input cells have connections 
pointing either to hidden cells and followed by connections towards output cells 
(fig. 12) or, directly, to output cells (this case corresponds to a numerical weight 
equal to 0), but not necessarily to all output cells (no connection at all corresponds to 
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a numerical weight equal to 1). As soon as a connection is issued from an input cell, 
a linguistic weight exists, but not always a numerical weight does, in the case of no 
hidden cell. Hidden cells have only numerical weights associated with connections 
towards output cells. 



Input Cell 


(linguistic) 

w 

ii 


(numerical) 



Hidden Cell 



Output Cell 


Fig. 12 - General connection with a primary (linguistic) weight 
and a secondary (numerical) weight 


Input cells can take on numerical values or fuzzy numbers, in their 
underlying universe of discourse. When the input cells Sj's are given, output cells 

Ai’s are computed according to the following formula (combination of weights for 
inferencing) : 


Ai = MIN: MAX [bij , (X (dj)] 

J Wii 


for numerical dj's 


or else, Ai = MINj MAX [by , Jt(wy,Dj)] for fuzzy numbers Dj's, 
where : - dj's are numerical value assigned to Sj's, 

- Dj's are fuzzy numbers meaning "around dj's," 

- |X (dj) is the grade of membership of dj in wy t 

wy 

- 7 C(wy,Dj) is the possibility measure of Dj GIVEN wy. 

Of course, a mixed formula for a Ai can involve both numerical dj's and fuzzy 
numbers Dj's, and in the above formula, t-norms and t-conorms could replace MIN 
and MAX operators, respectively. 


Let us consider now training examples, i.e. for a Ai, it is given the 
corresponding Sj's connected to it 


1st case .- The wy's are assumed to be known, at least as a rough 
approximation, so that the unknown are the by's. How to solve this type of equation 
was early presented in [32] (see also [33] for extensions and more developments) in 
the general case of complete dually Brouwerian lattices, in which the set of x's such 

that MAX(a,x) > b contains a least element, denoted a£b (note that a8b is also 
defined in [0,1] as being equal to b if a < b and to 0 if a > b). In case of poor 
solutions, membership functions of the wy's are adjusted by shifting or changing the 
slopes (tuning). 

2nd case .- Neither the wy's, nor the by's are known, but the wy's are 
supposed to be members of a known finite fuzzy partition of [-1,+1], like in fuzzy 
logic control (see fig.10). Again, for each wy of the fuzzy partition, the above 
equation has to be solved. 
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BIOMEDICAL APPLICATION 

We now illustrate the fuzzy connectionist network, using the same previous 
biomedical domain : inflammatory protein variations. We consider the same five 
proteins : C 3 -Complement Fraction (C 3 ), Alpha- 1 -Antitrypsine (A 1 AT), 
Orosomucoid (Om), Haptoglobin (Hpt), C-Reactive Protein (CRP) and, for 
simplicity, only four diagnostic groups composed of the Normal condition and of 
three biological inflammatory syndromes : Bacterial Infection, Vasculitis, Nephrotic 
Syndromes. 

The P.B.I.S. network we present here, is depicted in fig. 13 , in which the 
five proteins correspond to the input cells Si,..., S5 and the four groups to the 

output cells Ai,..., A4. There are seven hidden cells associated with numerical 
weights. The linguistic weights have the following meaning, 
wii : normal, 

W12 : normal, W22: increased, W32 : decreased or normal, W42 : decreased or normal, 
W13 : normal, W23 : increased, W33 : increased, W43 : decreased or normal, 

W14 : normal, W24: increased, W34 : very increased, W44 : slightly increased or 
increased, 

W15 : normal, W25 : very increased, W35: very increased. 


BACTERIAL NEPHROTIC 

NORMAL INFECTION VASCULITIS SYNDROMES 



Fig. 13 - P.B.I.S. neuro fuzzy network. 


For example, in this network. Vasculitis (A3) is connected with : 

- A 1 AT (S2) : decreased-or-normal (W32), 

- Om (S3) : increased ( W33), with weight 0.2 (b33), 

- Hpt (S4) : very-increased ( W34), with weight 0.7 (b34), 

- CRP (S5) : very-increased ( W35), with weight 0.2 (b35). 

There is no connection with C 3 (Si), corresponding to a numerical weight 1 (b3i). 
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We are now practically exploring this biomedical application (computed 
results will be presented in a forthcoming extended version of the last section of this 
paper) and we are studying an application of this method to handwritten character 
recognition. 
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INTRODUCTION 

One of the most interesting aspects of Expert Systems research is to 
gain some insights about human problem solving strategies by trying to 
emulate them in programs. Experts in a domain are better than novices in 
performing problem solving tasks. This is due to their greater experience in 
solving problems that provides them with better strategies. Such strategies are 
knowledge about how to use the knowledge they have in their domain of 
expertise. This kind of knowledge is called metaknowledge and is represented 
by means of meta-rules in the MILORD system for diagnostic reasoning. 
Diagnostic reasoning heavily involves metaknowledge to focuss attention on 
the most plausible hypotheses or goals in a given situation and to control the 
inference process. Furthermore, uncertainty also plays an important role at the 
control level, for example, decisions are taken depending on the uncertainty of 
the facts supporting them. 

On the other hand, psycological experiments (Kuipers et al., 1989) 
show that human problem solvers do not use numbers to deal with uncertainty 
but symbolic descriptions expressing categorical and ordinal relations and that 
in complex situations, the propagation and combination of uncertainty is a 
local context dependend process. MILORD has a modular structure that allows 
to represent and manage uncertainty by means of local operators defined over a 
set of ordered linguistic terms defined by the expert. 

In this paper we describe the MILORD system foccusing in the 
metaknowledge and in role that uncertainty plays in such modular system, 
that is, its role in the local deductive mechanisms within each module and as a 
control feature in the task of selecting and combining modules to achieve a 
solution. 

Before describing MILORD, the paper starts by presenting 
fundamental concepts on control structures for rule-based systems. 
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INFERENCE CONTROL FOR RULE-BASED SYSTEMS 

Inference control in problem solving is the aspect where the use of 
metaknowledge has more strongly been used (Aiello & Levi, 1988). Problem 
solving consists in the activation of rules starting from a set of known facts. 
The application of one rule may cause the activation of another one resulting it 
what is known as Rule Chaining. This can happen when the conclusions of one 
rule match the conditions for another. In general there is more than one rule 
that may be applicable at the same time but only one must be selected. This 
situation is known as the comflict resolution problem. Most expert systems 
make arbitrary choices such as the first rule in the list of applicable rules is the 
one selected, or the one containing more conditions, etc. On the other hand, 
having control knowledge represented by meta-rules allows to reason about 
which rule should be applied, that is, the system can dynamically decide which 
is the best object-level inference to perform. 

Another important aspect in problem solving is the control flow, that is, 
in which order the modules and submodules will be executed. In traditional 
software the control structure is fixed: one module calls otter modules to 
execute its subtasks and the calling sequence imposes the order of execution of 
the tasks. Control structures in expert systems can not be as rigid because 
often the expert has to adapt the order of execution of modules based on 
opportunities or obstacles that may arise. Such opportunistic problem solving 
behaviour is driven by what we call strategic knowledge and it is also part of 
the control meta-level. This control knowledge is extremely important because 
allows to closely emulate human's problem solving behaviour and therefore 
increases the credibility of the expert system. 

An example of a meta-rule representing strategic problem solving 
knowledge is (Godo et al., 1989): 

IF pneumonia is suspected and patient has AIDS 
THEN consider first the modules: P-CARINII, TBC, 

CITOMEGALOVIRUS, 

CRIPTOCOCCUS 

It is important to make clear that we have two levels of reasoning: 
object-level and control-level. 

The object-level is where the inferences about the problem domain are 
performed. At this level we have the rules that represent knowledge about the 
domain as well as descriptions of objects, properties and relations in the 
domain. 

The control-level is concerned with the problem solving strategies, that 
is, it controls in which order the tasks and subtasks will be executed. More 
sophisticated expert systems may have several control levels like in MILORD 
where there is a level whose goal is to combine different sequences of goals 
resulting from the application of more than one control meta-rule as we will see 
later. 

The overall problem solving control flow jumps back and forth between 
those levels. Part of the reasoning takes place at the control-level to deduce the 
next task (module) to be executed. Then, reasoning will proceed at the object- 
level (inside the module) to deduce new domain facts. As a result of that, new 
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control meta-rules might be applied that could suggest a new sequence of goals 
to be considered and combined with the previous one. This combined strategy 
will then be executed and so on. Later in the paper we will describe in more 
detail this process. 

UNCERTAINTY MANAGEMENT 

Most AI research on reasoning under uncertainty is concerned with 
normative methods to propagate and combine certainty values and there is 
some disagreement between the proponents of the different methods 
(Bayesians, Dempster-Shaferians, Fuzzy logicians, etc.). Hoewever, these 
methods do not really claim to closely mimic human problem solving under 
uncertainty. Although human problem solvers are almost always uncertain 
about the possible solution in complex domains, they often achieve their goals 
despite uncertainty by using methods that are particularized to the type of 
problem solving that they are performing at a given time. In fact, like (Cohen 
et al., 1987) puts it, managing uncertainty consists in selecting actions that 
simultaneously achieve solutions and reduce their uncertainty. This view leads 
to consider uncertainty as playing an important role at the control level 
because it is useful to constrain the focuss of attention (which part of the 
problem to work next) and action selection (how to work on it) as will be shown 
in the framework of MILORD. 

Furthermore, we belive that large complex expert systems draw their 
problem solving capabilities more from the power of the structure and control 
of their knowledge bases than from the particular uncertainty management 
formalism they use. On the other hand, the structure in the knowledge bases 
makes the propagation and combination of uncertainty a local, context 
dependent process. 

MODULARITY AND LOCALITY 

A knowledge base (KB) is a large set of knowledge units that covers a 
domain of expertise and provides solutions to problems in that domain of 
expertise. 

When faced with a particular case, human experts use only a subset of 
their knowledge for two reasons: adequacy of the general knowledge - the 
theory - to the particular problem and availability and cost of data. For 
example, the suspicion of a bacterian disease will rule out all knowledge 
referring to virical diseases; and also a patient in coma will make useless all 
the knowledge units that need patient's answers. 

The adequation of general knowledge to a particular problem is done at 
a certain level of granularity, for instance, the expert uses all the knowledge 
related to the diagnisis of a colon neoplasy or the knowledge related to the 
radiological analysis of a chest x-ray. 

In particular the structuration of KB's is made in MILORD taking into 
account this granularity in the use of knowledge. 

Each structural unit or theory (module from now on) will define an 
indivisible set of knowledge units (for example rules and predicates). The 
control will be responsible for the combination of the modules. The combination 
will represent the particularization of general knoledge to the problem that is 
being solved. The control will determine which combinations are acceptable. 
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For example, a module that determines the dosis of penicillin that has to be 
given to a patient must not be presented in any acceptable combination for a 
patient allergic to penicillin. 

The modularization of KB's leads to the concept of locality in the 
modules of a KB. It is possible to define the contents of a module independently 
of the definition of the rest of the modules. This possibility, methodologicaly 
desirable, allows the use of different local logics and reasoning mechanisms 
adapted to the subtasks that the system is performing. 

MODULARITY OVER MILORD: THE COLAPSES 
LANGUAGE 

The basic units of KB's written in our language, COLAPSES, are the 
modules. These may be hierarchically organized, and cpnsist of an 
encapsulated set of import, export, rule, meta-rule and submodule 
declarations. The declarations of submodules in a module is what structures 
the hierarchy. The declarations of submodules do not differ from the 
declaration of modules. We shall briefly outline which is the meaning of the 
primitive components of a module. A complete definition of the language and 
its semantics can be found in (Sierra, Agusti, 1990). 

Import: determines the non-deducible facts needed in the module to 
apply the rules. These facts are to be obtained from the user at run time. 

Export : defines which facts deduced or imported inside a module are 
visible from the rest of the modules that include the module as a submodule. 

Rule : define the deductive units that relate the import and the export 
components within a module. 

Metarule: defines the meta-logical componentof the module. Thus, the 
meta-rules of a module will control the execution of the rules in the module and 
the execution of the submodules in the hierarchy underneath the module. 

The syntax of a module definition is as follows: 

Module modid = modexpr 

where modid stands for an identifier of the module and modexpr for the body of 
the definition made out of the components specified above. Let us look at an 
example of module definition. 

Module gram esputum = 

begin 

import Class, Morphology 

export morpho, esputum ok 

deductive knowledge: 

Rules: 

R001 If class > 4 then esputum ok is sure 

end deductive 


end 
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There is also the possibility of defining generic modules that represent 
functional abstractions of several non genric modules. 

LOCAL LOGICS 

It is clear that experts use different approaches to the management of 
uncertainty depending on the task they are performing. Usually expert 
systems building tools provide a fixed way of dealing with uncertainty 
proposing a unique and global method for representing and combining 
evidence. In the COALPSES language it is possible to define different 
deduction procedures for each one of the modules. If from a methodological 
point of view a task is associated with a module then, a different logic can be 
used depending on the task. 

The definition of local logics is made by the next primitive in the 
COLAPSES language: 

Inference system: 

Truth values = list of linguistic terms 
Renaiming = morphisms between linguistic terms 
Connectives: 

Conjunction = function definition 
Disjunction = function definition 
Inference patterns: 

Modus ponens= function definition 

This primitive is included as a component of the deductive knowledge 
of a module. 

Next, we shall explain each one of the components of the local logic 
definition. 

Truth values. This component defines the set of linguistic terms that 
will be used in the logical valuation of facts, rules and meta-rules of the module 
where this logic is to be used. Different modules can have different sets of 
linguistic terms. 

Renaiming. Modules in a KB define a hierarchy of tasks. Each of the 
modules can have a different logic, so it is necessary to define a way of 
interconnecting these different logics. In MILORD this is done in a declarative 
way. Each module that contains several submodules has a set of morphism 
definitions that translate the valuations of predicates in the submodules to 
valuations in the logic of the module. 

Module B = 
begin 

Module A = 
begin 

Import C 
Export P 

Deductive knowledge: 

Rules: 


R1 if C then conclude P is possible 
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Inference system: 

Truth values = (false, possible, true) 

End deductive 
end 

Import D 

Export Q 

Deductive knowlegdge: 

Rules: 

R1 if A/P and D then conclude Q is 
quite possible 

Inference system: 

Truth values = (impossible, moderately possible, 

quite possible, sure) 

Renaming = A/false = = > impossible 

A/possible = = > quite possible 

A/true = = > sure 

end deductive 

end 

Notice in the above example that the predicate P exported by the 
submodule A of B which is used in the rule defined in B will be evaluated with 
one of the three values: false, possible or true. To use this fact in the module B 
we need to change that value for a different one which can be used by the logic 
defined in B. This is done by the via of the renaming definition. 

Connectives. This component defines the function that will be used in 
the deduction process associated with the module. Different multiple-valued 
functions can be defined or elicited depending on the task defined by the 
module. Next we explain the connectives elicitation process. 

OPERATOR ELICITATION WITH LINGUISTIC TERMS 

The elicitation of connective operators has been widely studied when 
truth values are expressed in the unit interval [0, 1]. On the contrary, little 
effort has been devoted to study what such operators would be like in the case of 
a finite number of truth-values. This problem has been encountered in the field 
of Expert Systems when trying to model expert reasoning by means of 
linguistically expressed uncertainty about the truth of rules and facts (Godo et 
al., 1989). Most previous works (Lopez de Mantaras, 1990) in generating 
operators for lingustic terms used some kind of discretization on the continuous 
truth-space [0, 1]. In this approach the expert was requiered to give a 
numerical representation for the linguistic terms (intervals, fuzzy intervals, 
fuzzy labels), then, a combination function in [0, 1] was selected to model a 
logical connective. The selection was made according to some properties the 
function should fulfill. Next, the selected function was applied to the 
representations of terms, and, whenever the result of a combination lied 
outside the term set, it was approximated to the "closest” term, in order to keep 
the term set closed under combinations. This approach has some drawbacks, 
however; 
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- often the experts supplying the knowledge are not able to define the 
meaning of the linguistic values using a numerical scale, although they have 
no difficulty in ordering them. 

- different experts might not agree on the representation of some or all 
the linguistic values. 

- the necessary approximation process does not always ensure that 
resulting operators satisfy the properties which originally were required to the 
functions used to generate them. 

These disadvantages lead us to propose an alternative approach (Lopez 
de Mantaras et al., 1990). The central idea consists in treating linguistic terms 
as mere labels without assuming any underlying numerical repesentation, and 
then eliciting the connective operators directly on the set of labels. The only a 
priori requirement is that these labels should represent a totally ordered set of 
linguistic expressions about uncertainty. For each logical connective, a set off 
desirable properties of the corresponding operator is listed. These properties act 
as constraints on the set of possible solutions. In this way, all operators 
fulfilling the set of properties are generated. Afterwards, the domain expert 
may select the one he thinks fits better his own way of uncertainty 
management. This approach can be easily implemented by formulating it as a 
constraint satisfaction problem, and most of the disadvantages of the former 
approach are avoided. 

META-REASONING BY INTROESPECTION USING 
UNCERTAINTY 

Having considered uncertainty as a logical component of the 
COLAPSES language, i.e. the semantics of formulae, the control of reasoning 
under the uncertainty must be considered as a component of the meta-logic. 
Thus the meta-inference over the uncertainty will determine which will the 
inference control be at the logic level. This meta-inference acts upon the logic 
component using mechanisms of introspection, that is, the same language 
represents the uncertainty of the propositions and provides mechanisms both 
to look at this uncertainty and to determine the control to be followed. 

This meta-control is defined as a component of the modules, allowing a 
local meta-logic definition. This control component acts over the deductive 
knowledge and over the submodules hierarchy. It determines which rules and 
submodules are useful for the current case. The mechanism of interaction 
between both components is a reflexion mechanism: the deductive component 
reflects on the control component to know which will be the next strategic step, 
which submodule to execute next, or which rule to use next. 

It is not a full reflection mechanism because we allow the meta-logic to 
see only the valuation of atomic formulae (facts) and the valuation of strategies 
(sets of modules that combined can lead the system to the solution of the 
problem), rules and meta rules can not be consulted by the meta-logic. 

This general mechanism is used to guide the inference process in 
different directions; we are going to discuss some of them. 

EVIDENCE INCREASING 

The current uncertainty of facts can be used to control the deduction 
steps in order to increase the evidence of a given hypothesis. So, for example, if 
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we have an alcoholic patient with a cavitation in the chest x-ray and there is 
low evidence for tuberculosis, then the Ziehl-Nielssen test to determine more 
clearly whether he has a tuberculosis should not be done. But if he presents a 
risk factor for AIDS then we shall increase our evidence for tuberculosis and 
the test will be suggested. This is expressed as follows: 

If tuberculosis > moderately possible 

then conclude Test Ziehl-Nielssen 

If risk factor for AIDS then conclude tuberculosis is possible 

If Alcoholic and Cavitation 

then tuberculosis is almost impossible 

Remark: The first rule is a rule of the meta-logic component of the 
language whilst the others are rules at the logic level. 

STRATEGY FOCUSING 

The uncertainty of facts can determine the set of hypothesis to be 
followed in the sequel. 

Example: 


If the pneumonia is bacterial with certainty < quite possible and 

the pneumonia is atipical with certainty > possible 
Then consider 

Mycoplasm, Virus, Clamidia, Tuberculosis, Nocardia, 
Criptococcus, Pneumocistis-Carinii 
with certainty quite possible 

This example means that the modules to be used in order to find a 
solution to the current case are those indicated in the conclusion of the meta- 
rule and should be considered in the order specified there. 

Strategies have a certainty degree attached to them. This is useful to 
differentiate the strategies generated by every especific data, from those 
generated by general data. As an example consider the case of a patient with 
AIDS (which is a kind of immunodepression). If we know that the patient 
suffers from AIDS, a more specific strategy (and also more certain) can be 
generated. But if just know that the patient has a immunodepression a less 
certain general strategy would be generated. Since we may have several 
candidate strategies simultaneously, combining different strategies is a matter 
of great importance in the control of the system. This is also achived by looking 
at the uncertainty of the strategies, as the next example shows: 

If Strategy (X) and Strategy (Y) and Certainty (X)> Certainty (Y) 
and Goals (X) H Goals (Y) * 0 
Then Ockham (X,Y) 
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where Ockham (X, Y) is a combination of the strategies that favour those 
moduls found in the intersection of both strategies 

KNOWLEDGE ADEQUATION 

As indicated at the begining of the paper a KB is a set of knowledge 
units that have to be adapted to the current case. For example alcoholism is a 
useful concept when determining a bacterial pneumonia, but it is useless for 
non-bacterial diseases. Then, for example a possible use of the uncertainty of 
the fact bacterianicity is to decide about the use of a given concept in the whole 
KB, i.e. to adequate the general knowledge to the particular problem. 
Example: 


If no bacterian disease 

then do not consider alcoholism in the search of the solution 

SOLUTION ACCEPTANCE 

The degree of uncertainty of a fact can also be used to stop the 
execution of the system. For example 

If Pneumocitis-carinii and tuberculosis < possible 
and Criptococcus< possible 

Then stop 

The control tasks we have discussed use uncertainty as a control 
parameter and are tasks of the meta-logic level. They are represented as a local 
meta-logic component of each module in what is called the control knowledge 
component of a module. In the next paragraph we shall describe in some detail 
this locality. 

METACONTROL AND LOCALITY 

The structured definition of KB's helps not only in the definition of safe 
and maintainable KB's but also gives some new features that where impossible 
to achieve in the previous generation of systems. Among them the most 
important is the possibility of defining a local meta-logical components for each 
one of the modules. 

The definition of strategies (ordered set of elementary steps to solve a 
problem) in a previous version of the MILORD system (Godo et al., 1989) was 
made globally. Only one strategy could be active at any moment. Presently, as 
many strategies as nodes in the module graph structure can be active. This 
flexibility is linked with the fact that each module can have a different 
treatment of fincertainty. So, the uncertainty plays a different role as a control 
feature depending on the association between module and logic. 

Furthermore, given the fact that the system consists of a hierarchy of 
submodules the meta-logical components act ones upon the others in a 
pyramidal fashion. This allows us to have as many meta-logic levels as 
necessary in an application. Further research will be purused along this line. A 
richer representation of the logic components in the meta-logic will also be 
investigated and sound semantics from the logic point of view will be defined. 
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CONCLUSION 

One interesting aspect of building expert systems is to learn something 
about human problem solving strategies by trying to reproduce them in 
programs. Human problem solver's are uncertain in many situations and do 
not use a simple normative method to handle uncertainty. Instead they take 
advantage of a good organization in the problem solving task to obtain good 
solutions using qualitative approximations. This suggests to consider 
uncertainty as playing an important role at the control level by guiding the 
problem solving strategies. In order to illustrate these points, we have 
described a modular architecture and language that extensively exploits 
uncertaintyas a control feature and uses local context dependent combination 
and propagation uncertainty operators. 
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Abstract 

We present how fuzzy logic with linguistic quantifiers, mainly its calculi of 
linguistically quantified propositions, can be used in group decision making. 
Basically, the fuzzy linguistic quantifiers (exemplified by most, almost all , ...) are 
employed to represent a fuzzy majority which is in many cases closer to a real 
human perception of the very essence of majority. Fuzzy logic provides here means 
for a formal handling of such a fuzzy majority which was not possible by using 
traditional formal apparata. Using a fuzzy majority, and assuming fuzzy individual 
and social preference relations, we redefine solution concepts in group decision 
making, and present new «soft» degrees of consensus. 


Keywords 

Fuzzy logic, linguistic quantifier, fuzzy preference relation, fuzzy majority, 
group decision making, social choice. 


1 . INTRODUCTION 

Decision making, whose essence is basically to find a best option from among 
some feasible (relevant, available, ...) ones, is what human beings constantly face 
in all their activities. In virtually all nontrivial situations decision making does 
require intelligence. 
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Due to an increasing complexity of environments in which decisions are to be 
made today, the human decision maker is often under pressure and stress, and 
overloaded. Some (computerized) decision support may be therefore of much help. 
Since, as we mentioned before, intelligence is required, a decision support system 
should be what might be termed intelligent. However, in spite of a considerable 
progress in broadly perceived artificial intelligence, we are still far from knowing 
definitely how to devise intelligent systems, i.e. in our context how to introduce 
intelligence into decision support systems. 

One of crucial difficulties in this respect is that decision support should rely on 
some formal decision making models. Unfortunately, though there is an abundance 
of them, for virtually all imaginable situations, they have been developed within a 
traditionally perceived mathematical direction where, roughly speaking, «nice» 
formal properties have had priority over «human consistency». This has led to some 
crucial problems among which what may be termed an implementation barrier is 
certainly of primal concern. Basically, its essence is that the human decision makers 
are often not willing to accept results obtained by formally (mathematically) valid 
models. 

Attempts to incorporate some sort of human consistency (which may be viewed 
as a first step to the incorporation of intelligence) in decision making models have 
been undertaken for a long time (see e.g. Braybrook and Lindblom, 1963). For 
instance, in this perspective we can view various aspiration - level - based 
approaches in which, say, a strict optimization (which is often contradictory to a 
real and human perception of the problem’s specifics) is replaced by a much milder 
requirement to attain some levels of satisfaction (see Simon, 1972). 

There has also been attempts to attain the above mentioned human consistency 
by means of fuzzy - logic - based tools. This has mainly involved the use of calculi 
of linguistically quantified statements. These attempts have concerned multicriteria 
decision making (cf. Kacprzyk and Yager, 1984a, b, 1990; Yager, 1983a, b, 1984, 
1985a, b), multistage decision making (Kacprzyk, 1983; Kacprzyk and Iwanski, 
1987), and group decision making and consensus formation which will be discussed 
in more detail in this paper. For more general papers on issues related to that fuzzy 
- logic - based perspective on human consistency, see also Kacprzyk (1987b). 

In this paper we will consider the problem of how fuzzy logic may be used to 
attain a higher human consistency of group decision making and consensus models. 
Such models, adopting our perspective, may help provide a basis of intelligent 
decision support systems for group decision making and consensus formation. 

The essence of group decision making may be summarized as follows. There is a 
set of options and a set of individuals who provide their preferences over the set of 
options. The problem is basically to find a solution meant to be an option (or a set 
of options) which is best acceptable by the group of individuals as a whole. 

Though the above basic problem formulation seems to be extremely simple, 
maybe even trivial, it is certainly not. Since its very beginning group decision 
making has been plagued by negative results exemplified by Arrow’s general 
impossibility theorem, Gibbard’s and Satterthwaite’s results on the manipulability 
of social choice functions, McKelvey’s and Schofield’s findings on the instability of 
solutions in spatial contexts, etc. (Arrow, 1963; Gibbard, 1973; Safferthwaite, 
1975; McKelvey, 1979; Schofield, 1984; see also Nurmi, 1987; Nurmi, Fedrizzi 
and Kacprzyk, 1990). Basically, all these findings can be summarized as follows: no 
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matter which group choice procedure we will employ, it will satisfy some set of 
plausible conditions but not another set of equally plausible ones. This general 
property pertains to all possible choice procedures, so that attempts to develop new, 
more sophisticated choice procedures do not seem very promising in this respect 
Much more promising seems to be to modify some basic assumptions underlying 
the group decision making process. This line of reasoning is pursued hoe. 

Since the process of decision making, notably of group type, is centered on the 
human beings, with their inherent subjectivity, imprecision and vagueness in the 
articulation of opinions, etc., fuzzy sets have been used in this field for a long time. 
A predominant research direction is hoe based on the introduction of an individual or 
social fuzzy preference relation which is then used to find some choice sets. There is 
a rich literature on this topic (cf. Tanino, 1984, 1988, or many articles in Kacprzyk 
and Fedrizzi, 1990), and since this is not explicitly related to the use of fuzzy logic, 
we will not discuss these issues in more detail here (though we will assume that the 
preference relations are fuzzy). We will concentrate on other elements of group 
decision making models where a contribution of fuzzy logic can be explicitly 
demonstrated. 

One of basic elements underlying group decision making is the concept of a 
majority (notice that the solution is to be some option(s) best acceptable by the 
group as a whole, that is by most of its members since in no real situation it would 
be accepted by all). Some of the above mentioned problems with group decision 
making are closely related to a (too) strict perception of majority (e.g., at least a 
halQ. A natural line of reasoning is to try to somehow make that strict concept of 
majority closer to its human perception. And here, we find many examples in all 
kinds of human judgments that what the human beings consider as a required 
majority to, say, justify the choice of a course of action is often much more vague. 
A good example in a biological context may be found in Loewer and Laddaga 
(1985): «... It can correctly be said that there is a consensus among biologists that 
Darwinian natural selection is an important cause of evolution though there is 
currently no consensus concerning Gould’s hypothesis of speciation. This means 
that there is a widespread agreement among biologists concerning the first matter 
but disagreement concerning the second... ». A rigid majority as, e.g., more than 
75% would evidently not reflect the essence of the above statement. It should be 
noted that there are naturally situations when a strict majority is necessary, for 
obvious reasons, as in all political elections. 

To briefly summarize the above considerations, we can say that a possibility to 
accommodate a less rigid («soft») majority (as, say, an equivalent of a widespread 
agreement in the above citation) would certainly help make group decision models 
more human consistent. 

It is easy to see that most natural manifestations of such a «soft» majority are 
the so-called linguistic quantifiers as, e.g., most, almost all, much more than a half, 
etc. One can readily notice that no conventional formal (e.g., logical) apparatus 
provides means for handling such quantifiers since, e.g., in virtually all 
conventional logics only two quantifiers, at least one and all are accounted for. 

Fortunately enough, there have been proposed in recent years some fuzzy - logic 
- based calculi of linguistically quantified propositions (Yager, 1983a, b; Zadeh, 
1983) which can make it possible to handle fuzzy linguistic quantifiers. These 
calculi have been applied by the authors to introduce a fuzzy majority (represented 
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by a fuzzy linguistic quantifier) into group decision making and consensus 
formation models (Fedrizzi and Kacprzyk, 1988; Kacprzyk, 1984, 1985b, 1986, 
1987a; Kacprzyk and Fedrizzi, 1986, 1988, 1989; Kacprzyk, Fedrizzi and Nurmi, 
1990; Kacprzyk and Nurmi, 1988; Nurmi and Kacprzyk, 1990; Nurmi, Fedrizzi and 
Kacprzyk, 1990), and also in an implemented decision support system to consensus 
reaching (Fedrizzi, Kacprzyk and Zadrozny, 1988; Kacprzyk, Fedrizzi and Zadrozny, 
1988). 

All that is clearly an example of a contribution fuzzy logic (with linguistic 
quantifiers) can make to qualitatively improve group decision making models. 

We will briefly present below die essence of this approach trying to maintain 
readability, and referring the reader who might be interested in more detail to a 
proper literature. 

Our notation related to fuzzy sets is standard. A fuzzy set A in X, A c X, is 
characterized by, and often equated with its membership function |Ia- X — > [0, 1]; 
PaOO g [0, 1] is the grade of membership of x in A, from full membership to full 

nonmembership through all intermediate values. For a finite X = (xj x„) we 

write A = |Xa ( x i)/ x i + ... + Pa ( x n)/ x n where ‘|1a ( x i)/ x i’ is the pair ‘grade of 
membership-element’ and «+» is meant in the set - theoretic sense. Moreover, we 
denote a a b = min (a, b), a v b = max (a, b), and «-»* stands for an implication 
operator in multivalued logic. Other, more specific notation will be introduced when 
needed. 


2. FUZZY - LOGIC - BASED CALCULI OF LINGUISTICALLY 
QUANTIFIED PROPOSITIONS 

Linguistically quantified propositions (statements) are commonly used in 
everyday life and may be exemplified by, say, «most experts are convinced* or 
« almost all good cars are expensive*. 

In general, we can write a linguistically quantified proposition as 

Qy’s are F (1) 

where Q is a linguistic quantifier (e.g., most), Y = {y} is a set of objects (e.g., 
experts), and F is a property (e.g., convinced). 

It is quite natural that we may wish to assign to the particular y’s (objects) a 
different importance (or relevance from the point of view of the fact mentioned in 
the statement). Importance, B, may therefore be added to (1) yielding 

QBy’sareF (2) 

that is, say, «most (Q) of the important (B) experts (y’s) are convinced (F)». 

For our purposes, the main problem is now to find the truth of such 
linguistically quantified statements, i.e. eiTher truth (Qy’s are F) or truth (QBy’s are 
F) knowing truth (yj is F), V yi e Y. This may be done using two basic calculi, 
one due to Zadeh (1983) and one due to Yager (1983a, b). In the following we will 
present the essence of Zadeh’s calculus since it is simpler and more transparent. 



267 


hence better suited for the purposes of this volume, though we should bear in mind 
that in many instances Yager’s calculus may be more «adequate» (cf. Kacprzyk, 
1986, 1987b; Kacprzyk and Fedrizzi, 1989). 

In Zadeh’s (1983) method, a fuzzy linguistic quantifier Q is assumed to be a 
fuzzy set defined in [0,1]. 

For instance, Q = «most» may be given as 

M«most» 00 = 1 for x i 0.8 

= 2x - 0,6 for 0.3 < x < 0.8 

= 0 for x< 0.3 (3) 

which may be meant as that if at least 80% of some elements satisfy a property, 
then most of them certainly (to degree 1) satisfy it, when less than 30% of them 
satisfy a property, then most of them certainly do not satisfy it (satisfy to degree 0), 
and between 30% and 80% - the more of them satisfy that property, the higher the 
degree of satisfaction by most of the elements. 

Notice that we will consider here the proportional quantifiers exemplified by 
«most», «almost all», etc. as they are more important for the modelling a fuzzy 
majority than the absolute quantifiers exemplified by «about 5», «much more than 
10», etc. The reasoning for die absolute quantifiers is however analogous. 

Property F is defined as a fuzzy set in Y. For instance, if Y = [X, Y, Z] is the 
set of experts and F is a property «convinced», then F may be exemplified by F = 
«convinced» = 0.1/X + 0.6/Y + 0.8/Z which means that expert X is convinced to 
degree 0.1, Y to degree 0.6 and Z to degree 0.8. If now Y = [yj, .... y p ], then it is 
assumed that truth [yj is F) = pp (yj), i = 1, .... p. 

The value of truth (Qy’s are F) is determined in the following two steps (Zadeh, 
1983): 


p 

r = ICount(F) / iCount (y) = ^ X My;) 

^ i*l 


(4) 


truth (Qy’s are F) = |Jq (r) 


(5) 


Basically, (4) determines some mean proportion of elements satisfying the 
property under consideration, and (5) determines the degree to which this percentage 
satisfies the meaning of Q. 

In the case of importance added, B is defined as a fuzzy set in Y, and (Iq (Yi) e 
[0,1] is a degree of importance of yj: from 1 for definitely important to 0 for 
definitely unimportant, through all intermediate values. For instance, B = 
«important» = 0.2/X + 0.5/Y + 0.6/Z means that expert X is important (competent) 
to degree 0.2, Y to degree 0.5, and Z to degree 0.6. 

We rewrite first «QBy’s are F» as «Q(B and F)y’s are B» which leads to the 
followings counterparts of (4) and (5) 
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r’ = XCount(B and F) / ECount (B) = 


p p 

= S^yi) a Myp) / X ^b^) , (6) 

i»l i-1 

troth (QBy’s are F) = HQ(r’) (7) 

The essence of these two steps is similar as that of (4) and (5). 

Example 1. Let Y = «experts” = {X, Y, Z}, F = «convinced» = 0.1/X + 0.6/Y + 
0.8 fL, Q = «most» be given by (3), B = important = 0.2/X + 0.5/Y + 0.6/Z. 
Then: r = 0.S, r’ = 0.8, and truth («most experts are convinced*) = 0.4 and truth 
(«most of the important experts are convinced*) = 1. 

The method presented is simple and efficient and has proven to be useful in a 
multitude of cases. Sometimes, however, it may lead to somewhat counterintuitive 
results (cf. Yager, 1983b). An alternate calculus by Yager (1983a, b; 1985a, b) may 
often be more useful though it is far more complicated, conceptually and 
numerically, and will not be dealt with here. 


3. GROUP DECISION MAKING UNDER FUZZY PREFEREN- 
CES WITH A FUZZY MAJORITY REPRESENTED BY A 
LINGUISTIC QUANTIFIER 

The purpose of this section is to redefine some solution concepts of group 
decision making under fuzzy preference relations by employing Zadeh’s fuzzy logic 
- based - calculus of linguistically quantified propositions to deal with a fuzzy 
majority. 

To set the stage for our next discussion, we will sketch the essence of group 
decision making. We have therefore a set of n options, S = {sj, .... s„), and a set of 
m individuals, I = { 1, .... m). Each individual k e I provides his or her preferences 
over S. Since these preferences may be not clear - cut, their representation by 
individual fuzzy preference relations is strongly advocated (see, e.g., the articles in 
Kacprzyk and Fedrizzi, 1990). 

A fuzzy preference relation of individual k, R^, is given by its membership 
function : S x S -» [0, 1] such that 

^R k ( s i> s j) = 1 if Si is definitely preferred over Sj 

= c € (0.5, 1) if Sj is slightly preferred over sj 
= 0.5 if there is no preference (i.e. indifference) 

= de (0,0.5) if sj is slightly preferred over Si 
= 0 if Sj is definitely preferred over Si 


( 8 ) 
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If card S is small enough, as we assume here, R^^nay be represented by a matrix 
R = [r-jl , rj- = HR k (Si, sj) ; i, j * 1, ... n; k = 1, ... m. R* is commoly 

assumed (also hoe) reciprocal, i.e. rjj + rj- =1; moreover, r- = 0, for all i, j, k. 

The fuzzy preference relations, similarly as their nonfuzzy countrparts, are 
evidently a point of departure for devising a multitude of solution concepts. 
Basically, two lines of reasoning may be followed here (cf. Kacprzyk, 1986): 

- a direct approach 

{Rl, .... R m ) -» solution 

- an undirect approach 

{Rl, .... R m ) -» R -» solution 

that is, in the first case we determine a solution just on the basis of the individual 
fuzzy preference relations, and in the second case we form first a social fuzzy 
preference relation (defined similarly as its individual counterpart but concerning the 
whole group of individuals) which is then used to find a solution. A solution is here 
not clearly understood - see, e.g., Nurmi (1983, 1988 a) for diverse solution 
concepts. 

More details related to the use of fuzzy preference relations as a point of departure 
in group decision making can be found in, e.g., Nurmi (1988) and in other articles 
in Kacprzyk and Roubens (1988) or Kacprzyk and Fedrizzi (1990). 

Here we will show how to redefine some better known solution concepts, the 
core for the direct approach and the consensus winner for the indirect approach, using 
a fuzzy majority represented by a linguistic quantifier. 


3.1. Direct derivation cf a solution - the core 

Among many solution cncepts proposed in the literature for the direct approach 
(i.e. for {Rj, ..., R m ) -> solution) the core is intuitively appealing and often used. 
Conventionally, the core is defined as a set of undominated options, i.e. those not 
defeated in pairwise comparisons by a required majority (strict!) r < m, i.e. 

C = {sj e S: -i 3 si e S such that iy > 0.5 for at least r individuals) (9) 
Nurmi (1981) extends the core to the fuzzy a - core defined as 

C a - {sj e S: -i 3 Si e S such that ry > a 2 0.5 for at least r individuals) 

( 10 ) 

i.e. as a set of options not sufficiendy (at least to degree a) defeated by the required 
majority. 

Suppose now that the required majority is imprecisely specified as, e.g., given 
by a fuzzy linguistic quantifier as, say, most defined by (3). 

While trying to redefine the above concepts of cores under a fuzzy majority, we 
start by denoting 
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hjj = 1 if rjj < 0.5 

= 0 otherwise (11) 

where here and later on in this section, if not otherwise specified, i, j = 1, ..., n and 

k = 1, .... m. Thus, hy reflects if option sj defeats s, or not. 

Then 


k 

~ n - 1 





is the extent to which individual k is not against option sj. 
Next 


( 12 ) 


h j = m I h ? 

k=l 

is to what extent all the individuals are not against sj. 
And 


(13) 


(14) 

is to what extent Q (say, most ) individuals are not against sj. 

The fuzzy Q - core is now defined as a fuzzy set 

Cq = Vq/ S j+ ... + Vq/ s n (15) 


i.e. a fuzzy set of options that are not defeated by Q (say, most) individuals. 

Analogously, by introducing a threshold of the degree of defeat in (1 1), we can 
define the fuzzy aJQ - core. First, we denote 


hfj (a) =i if hfj < a £0.5 

= 0 otherwise 


(16) 


and then, following the line of reasoning (12) - (15), and using hj (a) , hj (a) 
and v J q , respectively, we define the fuzzy ot/Q - core as 


C(u/Q= Vq((X) / Sl + ... + Vq/ S n. (17) 

i.e. a fuzzy set of options that are not sufficiently (at least to degree 1 - a) defeated 
by Q individuals. 



271 


We can also explicitly introduce the strength of defeat into (11) and define the 
fuzzy s/Q - core. Namely, we can introduce a function like 

hi = 2(0.5- rjj ) if rjj <0.5 

= 0 otherwise (18) 

and then, following the line of reasoning (12) - (15), but using h* , hj and Yq 

j 

instead of hj , hj and Yq , respectively, we define the fuzzy s/Q - core, as 


C s /q= vq / si + ... + vq / S|i (19) 

i.e. as a fuzzy set of options that are not strongly defeated by Q individuals. 

Example 2. Suppose that we have four individuals, k = 1, 2, 3, 4, whose fuzzy 
preference relations are 

j j 


0 

0.3 

0.7 

0.1 





n 

0.7 

0 

0.6 

0.6 


0 

0.4 

0.6 

0.2 

0.3 

0.4 

0 

0.2 

R2 = i 

0.6 

0 

0.7 

0.4 

.0.9 

0.4 

0.8 

0 , 


0.4 

0.3 

0 

0.1 
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0.6 

0.9 

0 , 


j 





j 



’ 0 

0.5 

0.7 

■ 

0 


0 

0.4 

0.7 

0.8' 

0.5 

0 

0.8 

0.4 

D, _ i 

0.6 

0 

0.4 

0.3 

0.3 

0.2 

0 

0.2 

1x4 — 1 

0.3 

0.6 

0 

0.1 

. 1 

0.6 

0.8 

0 . 


.0.7 

0.7 

0.9 

0 . 


Suppose now that the fuzzy linguistic quantifier is Q = «most» defined by (3). 
Then, say. 


C«most» ~ -- + 1 / S4 

30 

C(). 3 /«most» = O.9/S4 
C s /«most» = O.4/S4 
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that is, for instance, in case of C«a,ost» option S 2 belongs to the fuzzy Q - core to 
the extent 17/30 and option S 4 to the extent 1 , and analogously for Co.3/«most» and 
C 8 /«most»- Notice that though the results are different, for obvious reasons, S 4 is 
clearly the best choice which is evident if we examine the given individual fuzzy 
preference relations. 


3.2. Indirect derivation of a solution - the consensus winner 

We follow now the scheme {Rj, .... R m } -» R -> solution, i.e. from the 
individual fuzzy preference relations we determine first a social fuzzy preference 
relation, which is similar to its individual counterpart but concerns the whole group 
of individuals, and then find a solution from the social fuzzy preference relation. 

We will not deal here with the first step, i.e. (Rj Rm) and assume 

that R = [rjj] is given by 


n> - iti 

if i * j 


= 0 k ' 

otherwise 

(20) 

where 



ajj = 1 
= 0 

if rjj > 0.5 
otherwise 

(21) 

Notice that R need not be reciprocal (for reciprocal Ri R m ). For other 

approaches to the determination of R, see, e.g., Blin and Whinston (1973). 

We will discuss now the second step, i.e. R -> solution, that is how to 


determine a solution from a social fuzzy preference relation. A solution concept of 
much intuitive appeal is here the consensus winner (Nurmi, 1981) which will be 
extended here undo* a fuzzy majority expressed by a fuzzy linguistic quantifier. 

We start with 

gij =1 if rjj > 0.5 

= 0 otherwise ( 22 ) 

which expresses whether si defeats Sj or not, and then 


8j = 


1 

n -1 


j-u*i 


(23) 


which is a mean degree to which option s; is preferred over all the other options 
options. Next 
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z|i«|iQ<g) (24) 

is the extent to which Si is preferred over Q other options. 

Finally, we define the fuzzy Q - consensus winner as 

Wq=Zq/ S j+ ... + Zq/ s n (25) 

i.e. as a fuzzy set of options that are preferred over Q other options. 

And analogously as in the case of the core, we can introduce a threshold to (22), 
i.e. 


gy (a) = 1 if ry > a £ 0.5 

= 0 otherwise (26) 

and then, following the reasoning (23) and (24), and replacing g, and Zq by gi (a) , 
and zq (a), respectively, we can define the fuzzy a/Q - consensus winner as 


W « ft = ZQ(Ot) / Si + ... + Zq((X) / Sn. (27) 

i.e. as a fuzzy set of options that are preferred over Q (say, most) other options. 

Furthermore, we can also explicitly introduce the strength of preference into (22) 
by, e.g., defining 

gij = 2 (ry - 0.5) if ry > 0.5 

= 0 otherwise (28) 

and then, following the reasoning (23) and (24), and replacing gj and zq by gj 
and zq , respectively, we can define the fuzzy s/Q - consensus winner as 

Wj/q= Zq/ Si + ...+ 2Q/Sn (29) 

i.e. as a fuzzy set of options that are strongly preferred over Q other options. 

For more details on the above solution concepts, as well as on some other ones, 
see, e.g., Kacprzyk (1985b, c; 1986a) and Kacprzyk and Nurmi (1988). 

Example 3. For the same individual fuzzy preference relations as in Example 2, 
and using (20) and (21), we obtain the following social fuzzy preference relation 
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R 


j 

' 0 0 1 o' 

3/4 0 3/4 1/4 

0 1/4 0 0 

.13/4 1 0 . 


If now Q = «most» is given by (3), then we obtain 


W« m ost» = — / Si + — / S 2 + 1 / S4 

Wo.8/«most» = j ^ / Si + — / S4 

Ws/«most» = yj" / Si + — / + 1 / S4 

which is not to be read similarly as for the fuzzy cores in Example 2. Notice that 
here once again option S 4 is clearly the best choice which is obvious by examining 
the social fuzzy preference relation. 

This concludes our brief exposition of how to employ fuzzy linguistic 
quantifiers to model the fuzzy majority in group decision making. For readability 
and simplicity we have only shown the application of Zadeh’s calculus of 
linguistically quantified propositions. The use of Yager’s calculus is presented in the 
source papers by Kacprzyk (1984; 1985b, c; 1986a; 1987a) or in the surveys by 
Kacprzyk and Nurmi (1989) or Fedrizzi, Kacprzyk and Nurmi (1989). On the other 
hand, information on some newer solution concepts based on individual and social 
fuzzy preference relations which are the so-called fuzzy tournaments may be found in 
Nurmi and Kacprzyk (1990). 


4. «SOFT* DEGREES OF CONSENSUS UNDER FUZZY 
PREFERENCES AND A FUZZY MAJORITY REPRESENTED 
AS A FUZZY LINGUISTIC QUANTIFIER 

In this section we will show how to use fuzzy linguistic quantifiers as 
representations of a fuzzy majority to define a new «soft» degree of consensus as 
proposed in Kacprzyk (1987), and then advanced in Kacprzyk and Fedrizzi (1986, 
1988, 1990), and Fedrizzi and Kacprzyk (1988). This degree is meant to overcome 
some «rigidness» of conventional degrees of consensus in which full consensus (= 
1) occurs only when «all the individuals agree as to all the issues». This may often 
be countrinuitive, and not consistent with a real human perception of the very 
essence of consensus (see, e.g., the citation from a biological context given in the 
beginning of the paper). Our new degree of consensus can be therefore equal to 1, 
which stands for full consensus, when, say «most of the individuals agree as to 
almost all (of the relevant) issues (options)*. 
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Our point of departure is again a set of individual fuzzy preference relations 
which are meant analogously as in Section 3 (see, e.g., (8)). 

The degree of consensus is now derived in three steps. First, for each pair of 
individuals we derive a degree of agreement as to their preferences between all the 
pair of options, next we pool (aggregate) these degrees to obtain a degree of 
agreement of each pair of individuals as to their preferences between Q1 (a linguistic 
quantifier as, e.g., «most», ^almost all», much more than 50%», ...) pairs of 
relevant options, and, finally, we pool these degrees to obtain a degreee of agreement 
of Q2 (a linguistic quantifier similar to Ql) pairs of important individuals as to their 
preferences between Ql pairs of relevant options. This is meant to be the degree of 
consensus sought The above derivation process may be formalized by using Zadeh’s 
calculus of linguistically quantified propositions outlined in Section 2. 

We start with the degree of strict agreement between individuals kl and k2 as to 
their preferences between options Si and Sj 


vy (kl, k2) = 1 if rK 1 = rK 2 

= 0 otherwise (30) 

where here and later on in this section, if not otherwise specified, kl = 1, .... m - 1; 
k2 = kl + 1 , .... m; i = 1 , ..., n - 1 ; j = i + 1 , .... n. 

Relevance of options is assumed to be a fuzzy set defined in the set of options 
such that pb (Si) e [0, 1] is a degree of relevance of option sj: from 0 standing for 
«definitely irrelevant* to 1 for «definitely relevant*, through all intermediate values. 

Relevance of a pair of options, (sj, Sj) e S x S, may be defined in various ways 
among which 

by = (4 b (Si) + 4 b (sj)) / 2 (31) 

B B i> 

is certainly the most straightforward; obviously, bjj = bjj , and bfj ’s are 
irrelevant since they concern the same option. 

And analogously for the importance of individuals, I, which is defined as a fuzzy 
set in the set of individuals, with pj (k) e [0, 1], k = 1, ..., m, representing the 
importance of individual k, from definitely important (= 1) to definitely unimportant 
(= 0) through all intermediate values. Then, the importance of a pair of individuals, 

bkl,k2 » e [0* 1] may also be defined in various ways among which the mean 
value of type (3 1) is the most straightforward, and will be used here too. 

The degree of agreement between individuals kl and k2 as to their preferences 
between all the relevant pairs of options is 

B-t d n -1 a b 

Vb (kl, k2) = Xr ( V ij (kl> k2) A bjj> / Xr X bjj 

i=i j*i+ 1 i=l j=i+! 


(32) 
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The degree of agreement between individuals kl and k2 as to their preferences 
between Q1 relevant pairs of options is 


v qi (kl, k2) = H Q1 (v B (kl, k2)) 


(33) 


In turn, the degree of agreement of all the pairs of important individuals as to 
their preferences between Q1 relevant pairs of options is 


I.B 2 
~ m (m - 1) 


m-1 m m-1 m 

X X (vS,(kl,k2)Ab kl>ia )/X X b kl>k2 

M=1 lfi=M + l M=llfl=M + l 


(34) 


and, finally, the degree of agreement of Q2 pairs of important individuals as to 
their preferences between Q1 relevant pairs of options, called the degree of 
QlIQllIlB-consensus, is 


con (Ql, Q2, 1, B) = Ji B (v$ (35) 

Since the strict agreement (30) may be viewed too rigid, we can use the degree of 
sufficient agreement (at least to degree a e [0, 1]) of individuals kl and k2 as to 
their preferences between options Si and sj, defined by 

v?(kl,k2) = 1 if Jry-ryj £ 1 - a € 1 

= 0 otherwise (36) 

Then, following the reasoning (31) - (35), we obtain the degree of sufficient 
agreement (at least to degree a) of Q2 pairs of individuals as to their preferences 
between Ql pairs of relevant options (with replacements similar to those in Section 
3), called the degree of a/QHQ2IIIB - consensus, given by 


con “ (Ql, Q2, I,B) = pq 2 (v (37) 

We can also explicitly introduce the strength of agreement into (30), and 
analogously define die degree of strong agreement of individuals kl and k2 as to 
their preferences between options Si and sj, e.g., as 


vy (kl, k2) = s (|ry - rjjj) 


(38) 


where s: [0, 1] -> [0, 1] is some function representing the degree of strong 
agreements as, e.g., 


s 00 =1 

= -lOx + 1.5 
= 0 


for x 5 0.05 
for 0.05 < x < 0.15 
for x > 0.15 


(39) 
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such that x’ < x” -» s(x’) > s (x”), for all x’, x” 6 [0, 1], and s(x) = 1 for some x 

e [0, 1]. 

Then, following the reasoning (31) - (35) (with replacements similar to those in 
Section 3), we obtain the degree of strong agreement of Q2 pairs of important 
individuals as to their preferences between Ql pairs of relevant options, called the 
degree cfslQllQ2IIIBI - consensus, as 


cong (Ql, Q2) = IIqj (vq^ b) (40) 

Example 4. Suppose that n = m = 3, Ql = Q2 = «most» are given by (3), a = 
0.9, s(x) is defined by (39), and the individual preference relations are 


j 

1 2 3 


R 1 = [r'jl 

1 

0.0 

0.1 

0.6 

= i 2 

0.9 

0.0 

0.7 


3 

.0.4 

0.3 

0.0. 


j 

1 2 3 


1 

0.0 

0.1 

0.7 

R 2 = 6$ = i 2 

0.9 

0.0 

0.7 

3 

.0.3 

0.3 

0.0. 



J 


1 2 3 

1 

0.0 0.2 0.6 

R 3 = [ryl = i 2 

0.8 0.0 0.7 

3 

.0.4 0.3 0.0. 


B B 

Now, we assume that = 1/si + O.6/S2 + O.2/S3, i.e. b^ = 
0.6 and ba = 0.2, and b[ = 0.8/1 + 1/2 + 0.4/3, i.e. b^ = 0.9, 
and ba = 0.7 


0 . 8 , b% = 
ba =0.6 


Therefore: 

cons («most», «most», I, B) = 0.35 
con 0 - 90 («most», «most», I, B) = 1.0 
con* («most», «most», I, B) = 0.75 



278 


For more information on these degrees of consensus, see Fedrizzi and Kacprzyk 
(1988), Kacprzyk (1987a) and Kacprzyk and Fedrizzi (1986, 1988, 1989). Moreover, 
the use of Yager’s fuzzy - logic - based calculus of linguistically quantified 
propositions is given in Kacprzyk and Fedrizzi (1989). 


5. CONCLUDING REMARKS 

In this paper we have tried to show how fuzzy logic with linguistic quantifiers 
can be used to model a fuzzy majority, and then to define new solution concepts and 
degrees of consensus based on the fuzzy majority. Fuzzy quantifiers are certainly a 
natural way of representing a fuzzy majority which cannot practically be adequately 
represented by conventional formal means. On the other hand, fuzzy logic based 
calculi of linguistically quantified propositions, in particular the one employed in 
this paper, offer much simplicity and intuitive appeal, and can help attain more 
human consistent, hence more adequate and easier implementable group decision 
making and consensus formation models. 


BIBLIOGRAPHY 

ARROW KJ. (1963), Social Choice and Individual Values , 2nd ed. Yale University 
Press, New Haven. 

BUN J.M. and A.P. WHINSTON (1973), Fuzzy sets and social choice. Journal of 
Cybernetics, 4, 17 - 22. 

BRAYBROOK D. and C. LINDBLOM (1963), A Strategy of Decision. Free Press, 
New York. 

CALVERT R. (1986), Models of Imperfect Information in Politics. Harwood 
Academic Publishers, Chur. 

FEDRIZZI M. (1986), Group decisions and consensus: a model using fuzzy sets 
theory (in Italian). Rivista per le scienze econ. e soc. A. 9, F. 1, 12 - 20. 

FEDRIZZI M. and J. KACPRZYK (1988), On measuring consensus in the setting of 
fuzzy preference relations. In J. Kacprzyk and M. Roubens (Eds.), Non - 
Conventional Preference Relations in Decision Making. Springer - Verlag, 
Berlin - New York - Tokyo, 129 - 141. 

FEDRIZZI M., J. KACPRZYK and S. ZADROZNY (1988), An interactive multi - user 
decision support system for consensus reaching processes using fuzzy logic 
with linguistic quantifiers. Decision Support Systems 4, 313 -327. 

KACPRZYK J. (1984), Collective decision making with a fuzzy majority rule. Proc. 
WOGSC Congress, AFCET, Paris, 153-159. 

KACPRZYK J. (1985a), Zadeh’ s commonsense knowledge and its use in 
multicriteria, multistage and multiperson decision making. In M.M. Gupta et 
al. (Eds.), Approximate Reasoning in Expert Systems, North - Holland, 
Amsterdam, 105-121. 

KACPRZYK J. (1985b), Some « commonsense » solution concepts in group decision 
making via fuzzy linguistic quantifiers. In J. Kacprzyk and R.R. Yager 
(Eds.), Management Decision Support Systems Using Fuzzy Sets and 
Possibility Theory. Verlag T0V Rheinland, Cologne, 125-135. 



279 


KACPRZYK J. (1985c), Group decision - making with a fuzzy majority via 
linguistic quantifiers. Part I: A consensory - like pooling; Part II: A 
competitive - like pooling. Cybernetics and Systems: an Int. Journal 16, 1 19 
- 129 (Part I), 131 - 144 (Part II). 

KACPRZYK J. (1986 a). Group decision making with a fuzzy linguistic majority. 
Fuzzy Sets and Systems 18, 195 - 118. 

KACPRZYK J. (1986b), Towards an algorithmic/procedural «human consistency » of 
decision support systems: a fuzzy logic approach. In W. Karwowski and A. 
Mital (Eds.), Applications of Fuzzy Sets in Human Factors. Elsevier, 
Amsterdam, pp. 101 - 116. 

KACPRZYK J. (1987a), On some fuzzy cores and «soft» consensus measures in 
group decision making. In J.C. Bezdek (Ed.), The Analysis of Fuzzy 
Information, Vol. 2. CRC Press, Boca Raton, pp. 119-130. 

KACPRZYK J. (1987b), Towards «human consistent » decision support systems 
through commonsense - knowledge - based decision making and control 
models: a fuzzy logic approach. Computers and Artificial Intelligence 6, 97- 
122 . 

KACPRZYK J. and FEDRIZZI M. (1986), «Soft» consensus measures for monitoring 
real consensus reaching processes under fuzzy preferences. Control and 
Cybernetics 15, 309-323. 

KACPRZYK J. and FEDRIZZI M. (1988), A «soft» measure of consensus in the 
setting of partial (fuzzy) preferences. European Journal of Operational 
Research 34, 315-325. 

KACPRZYK J. and FEDRIZZI M. (1989), A «human-consistent» degree of consensus 
based on fuzzy logic with linguistic quantifiers. Mathematical Social 
Sciences 18, 275-290. 

KACPRZYK J. and FEDRIZZI M., Eds. (1990), Multiperson Decision Making 
Models Using Fuzzy Sets and Possibility Theory. Kluwer, Dordrecht - 
Boston - Lancaster - Tokyo. 

KACPRZYK J., FEDRIZZI M. and NURMI H. (1990), Group decision making with 
fuzzy majorities represented by linguistic quantifiers. In J.L. Verdegay and 
M. Delgado (Eds.): Approximate Reasoning Tools for Artificial Intelligence. 
Verlag TUV Rheinland, Cologne, 126-145. 

KACPRZYK J. and NURMI H (1989), Linguistic quantifiers and fuzzy majorities for 
more realistic and human-consistent group decision making, in G. Evans, W. 
Karwowski and M. Wilhelm (Eds.): Fuzzy Methodologies for Industrial and 
Systems Engineering, Elsevier, Amsterdam, 267-281. 

KACPRZYK J. and NURMI H. (1990), On fuzzy tournaments and their solution 
concepts in group decision making. European Journal of Operational 
Research (forthcoming). 

KACPRZYK J. and ROUBENS M., Eds. (1988), Non - Conventional Preference 
Relations in Decision Making. Springer - Verlag, Berlin - New York - 
Tokyo. 

KACPRZYK J. and YAGER R.R. (1984a), Linguistic quantifiers and belief 
qualification in fuzzy multicriteria and multistage decision making. Control 
and Cybernetics 13, 155-173. 

KACPRZYK J. and YAGER RJR. (1984b), « Softer » optimization and control models 
via fuzzy linguistic quantifiers. Information Sciences 34, 157-178. 



280 


KACPRZYK J., S. ZADROZNY and M. FEDRIZZI (1988), An interactive user - 
friendly decision support system for consensus reaching based on fuzzy logic 
with linguistic quantifiers. In M.M. Gupta and T. Yamakawa (Eds.): Fuzzy 
Computing. Elsevier, Amsterdam, 307-322. 

LOEWER B., Guest Ed. (1985), Special Issue on Consensus. Syn these 62, No. 1. 

LOEWER B. and LADDAGA R. (1985), Destroying the consensus. In Loewer (1985), 
79-96. 

MCKELVEY R.D. (1979), General Conditions for Global Intransitivities in Formal 
Voting Models. Econometrica 47, 1085-1111. 

NURMI H. (1981), Approaches to collective decision making with fuzzy preference 
relations. Fuzzy Sets and Systems 6, 249-259. 

NURMI H. (1983), Voting procedures: a summary analysis. British Journal of 
Political Science 13, 181-208. 

NURMI H. (1987), Comparing Voting Systems. Reidel, Dordrecht - Boston - 
Lancaster - Tokyo. 

NURMI H. (1988), Assumptions on individual preferences in the theory of voting 
procedures. In Kacprzyk and Roubens (1988), pp. 142-155. 

NURMI H., M. FEDRIZZI and J. KACPRZYK (1990), Vague notions in the theory of 
voting. In J. Kacprzyk and M. Fedrizzi (Eds.): Multiperson Decision Making 
Models Using Fuzzy Sets and Possibility Theory. Kluwer, Dordrecht - 
Boston - Lancaster - Tokyo, 43-52. 

SCHOFIELD N. (1984), Existence of Equilibrium on a Manifold. Mathematics of 
Operations Research 9, 545-557. 

SIMON H.A. (1972), Theories of Bounded Rationality. In C.B. McGuire and R. 
Radner (Eds.): Decision and Organization. North-Holland, Amsterdam. 

SAFFERTHWATTE M. (1975), Strategy-proofness and Arrow’ s Conditions: Existence 
and Correspondence Theorem for Voting Procedures and Social Welfare 
Functions. Journal of Economic Theory 10, 187-217. 

TANINO T. (1988), Fuzzy preference relations in group decision making. In 
Kacprzyk and Roubens (1988), 54-71. 

YAGER R.R. (1983a), Quantifiers in the formulation of multiple objective decision 
functions. Information Sciences 31, 107-139. 

YAGER R.R. (1983b), Quantified propositions in a linguistic logic. International 
Journal of Man - Machine Studies 19, 195-227. 

YAGER R.R. (1984), General multiple - objective decision functions and 
linguistically quantified statements. International Journal of Man - Machine 
Studies 21, 389 - 400. 

YAGER R.R. (1985a), Reasoning with fuzzy quantified statements: Part I. 
Kybemetes 14, 233-240. 

YAGER R.R. (1985b), Aggregating evidence using quantified statements. 
Information Sciences 3, 179-206. 

Y AGER R.R. (1986), Reasoning with fuzzy quantified statements: Part II. 
Kybemetes 15, 111-120. 

ZADEH L.A. (1983), A computational approach to fuzzy quantifiers in natural 
languages. Computers and Mathematics with Applications 9, 149-184. 

ZADEH L.A. (1985), Syllogistic reasoning in fuzzy logic and its application to 
usuality and reasoning with dispositions. IEEE Transactions on Systems, 
Man and Cybernetics SMC - 15, 754-763. 



14 

LEARNING IN UNCERTAIN 
ENVIRONMENTS 


Marco Botta, Attilio Giordana and Lorenza Saitta 


University di Torino 
Dipartimento di Informatica 
Corso Svizzera 185 
10149 TORINO (Italy) 
E-mail: saitta@di.unito.it 


ABSTRACT 

In this paper we briefly survey the problems arising in learning concept 
descriptions from examples in domains affected by uncertainty and vagueness. A 
programming environment, called SMART-SHELL, is also presented: it addresses 
these problems, exploiting fuzzy logic. This is achieved by supplying the learning 
system with the capability of handling a fuzzy relational database, containing the 
extensional representation of the acquired logic formulas. 

INTRODUCTION 

Knowledge acquisition has been recognized as a major problem for the quick 
and low cost development of expert systems. In fact, knowledge elicitation is a hard 
and time consuming task, especially in domains where there is a lack or shortage of 
human experts and/or the knowledge is difficult to be formalized. As a consequence, 
automated learning methods became appealing and machine learning is now receiving 
an increasing attention. 

Even though the complete automatization of the knowledge acquisition 
process is beyond the possibilities of the current AI technology, developing tools 
allowing a substantial part of the necessary knowledge to be first acquired, and, next, 
maintained and updated, is both a medium-term reachable goal and a very useful one. 
These tools are likely to become, in the future, a fundamental part of expert systems 
builders, provided that adequate interfaces towards knowledge engineers and domain 
expert will be supplied. 

Traditionally, machine learning tasks have ranged from acquiring concepts 
descriptions from examples [1-4] to improving planning heuristics [5-7] and 
knowledge representation schemes included logical formulas, decision trees (or 
networks), production rules and semantic networks [8-10,11,31]. Researches on 
scientific discovery [12-14] and concept formations [7,15-17] received also attention. 
In all these problems, the notion of learning as a search process, in a space of 
descriptions or hypotheses, plays a central role [18], especially in inductive 
approaches. 

Recently, new trends emerged, such as the proposal of chunking as a 
general cognitive architecture [19] and the use of deductive methods to performs 
"justified” learning [5,20-24]. As new, more complex tasks are faced, methods 
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become more refined and integrated models of learning are proposed with the hope of 
coping with the complexity of real world tasks [25-28]. 

A great activity is also going on in the field of connectionist models of 
learning, as it appears, from instance, from [29]. Another interesting approach is 
also constituted by the genetic algorithm [30], presented as a general-purpose learning 
method for parallel rule systems. 

Unfortunately, many learning systems only work in ideal domains, in which 
noise in the data and uncertainty in the task are absent. However, the effective use of 
learning systems in real-world applications substantially depends upon the ability 
these systems show in handling noise. Some kind of problems arising in real 
applications are summarized in [31]. 

Several systems are provided with mechanisms for facing statistical noise, 
such as the pruning techniques proposed to limit the sizes of decision trees [32-34]. 
Similar methods have been also proposed for knowledge represented in the form of 
production rules, as in the AQ15 system, where the initially acquired rules are 
truncated to limit complexity and avoid overfitting [35]. This kind of noise mainly 
concerns random errors in assigning a value to an attribute or a label to a training 
event. Several experiments have been performed to investigate the effects of this 
noise on the effectiveness of the acquired knowledge [36]. 

However, statistical noise is not the only source of problems; in fact, 
relevant concepts and relations can be ill-defined and vague. To this purpose, the 
fuzzy set theory seems the most appropriate tool for handling this type of 
uncertainty. We have to notice that a continuous-valued semantics, associated to the 
description language, is a major source of complexity in learning methodologies. 
Hence, very few systems are able to handle it explicitly and most of these limit 
themselves to attaching weights to the pieces of acquired knowledge [37,38]. 

Fuzzy sets occur in symbolic learning methodologies with different roles. 
In [39], they are used to describe concepts and the varying degrees of typicality of 
their instances. In [40] the intensional description of a set of classes, to be 
discriminated from each other, are expressed as fuzzy languages, learned from a set 
of examples. This approach has been applied to problems in medical diagnosis [41]. 

Finally, in ML-SMART, a system which learns concept descriptions from 
examples [42,43] and a domain theory [26,27], the use of fuzzy set theory has proved 
to be very suited to transform continuous-valued features into a set of categorical 
attributes and, in general, to define the vague semantics associated with real-world 
terms and predicates, both in the description of the examples and in the domain 
theory. ML-SMART is a learning system which uses a full memory approach, 
supported by the special-purpose shell SMART-SHELL [44,45], especially designed 
to ease the development of different learning systems. SMART-SHELL mainly 
consists of a logic programming environment, interfaced toward a relational data-base 
through a set of operators implementing the basic primitives necessary for a learner. 
The logic environment has been realized in Common Lisp, whereas the data-base 
manager has been tailored for the specific class of applications. This data-base differs 
from the commercial relational data-bases in the sense that many standard features 
have not been implemented, being not relevant to the particular use it is oriented to, 
whereas other important aspects have been enhanced: a query language based on full 
first order logics, including a set of non-standard quantifiers, and the capability of 
handling continuous-valued semantics. 

This paper is organized as follows. Section 2 briefly describes the learning 
framework used in systems like ML-SMART. Section 3 describes the logic language 
used for representing both the background knowledge and the acquired knowledge, 
whereas Section 4 illustrates the behaviour of the basic operators, interfacing the 
data-base and the learning system. Finally, Section 5 presents some conclusions. 
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THE LEARNING FRAMEWORK 

The learning tasks addressed by the ML-SMART system is that of "learning 
concept descriptions from examples" [45]. More precisely, the task can be formally 
defined as follows: 

Given: A set Hq of known concepts (classes), 

A set Fq of classified learning events, represented in an event description 
language L E , 

A concept description language L. 

Find: For every concept h e Hq, a formula <pe L such that 9 — > h, i.e., 9 

consists of a set of conditions, sufficient for an example to be an instance 
of h. 



Fig. 1 - Example of instances from a block world domain. 


One of the peculiarities distinguishing ML-SMART from other systems, devoted to 
the same task, consists in the use of a relational data-base as a working memory. In 
particular, both the learning set Fq and every inductive hypothesis 9 e L, generated 
during the search, are described extensionally using relations in the data-base. In the 
following 9 * will denote the extensional representation of the logical formula 9 . 

For the sake of exemplification, we will use a simple example from a block 
world domain. Consider the set Fq of instances reported in Fig. 1; they can be 
described by means of a set of relations, two of which are reported in Fig. 2(a). All 

the relations have the homogeneous format <F,H,X1,X2 Xn>, where F contains 

the identifier of the event f, H the correct classification h, and Xl,X2,...,Xn are the 
identifiers of the parts of f (objects) which satisfy the relation; later on, we will 
slightly extend this basic scheme. By using the standard operators of relational 
algebra [46], such as natural join, selection and projection, the extension 9 * of a 
generic formula 9 can be computed from the extensions of the predicates occurring in 
9 ; Fig. 2(b) shows the extension of the formula 9 = Triangle(x) a Large(x). 



Circle(xl) Large(xl) 




Triangle(xl) A Large(xl) 


F 

H 

xl 

EX1 

1 

a 

EX2 

2 

h 

EX3 

1 

j 


(b) 


Fig. 2 - Example of relations associated to simple formulas, evaluated on the 
instances of Fig. 1 . Relations in (a) are given by the teacher, the one in 
(b) is computed from the preceding ones. By definition, example 1 and 3 
are instances of a concept hj, whereas example 2 is an instance of another 
concept h 2 - 

The learning process can be modeler as a search through the space of formulas which 
can be generated in this way [18,43]. However, the set of formulas having a non 
empty extension may be too large and cannot be searched exhaustively. For this 
reason, ML-SMART uses several strategies for limiting the the number of formulas 
which are actually created and tested. In particular, it develops a tree of formulas 
using a set of specialization operators; the root of the tree is the maximally general 
formula "true", which obviously holds for all the events in Fq, and the leaves are 
either formulas corresponding to acceptable concept descriptions or formulas which 
are no more interesting. Three kinds of criteria are used to bias the inductive process 
in order to limit the size of the tree: 

• Simplicity and readability of the formulas. 

• Statistical criteria: formulas verified by many examples are preferred. 

• If background knowledge is available, formulas which can be deduced from it are 
preferred. Moreover, formulas contradicting the background knowledge cannot be 
generated. An extensive description of the methodology can be found in [42,43]. 

The SMART-SHELL environment provides the basic operators of 
specialization (and generalization) necessary to implement a problem solver of the 
type of ML-SMART, as well as a forward/backward inference engine, and the 
primitives necessary for implementing the search strategies. The relational data-base 
is a special-purpose one, implemented in such a way to achieve high speed on the 
most critical operations. On the top of this data-base, a logic environment has been 
implemented, as well as a user interface, designed to eases the process of supplying 
the system with the background knowledge and application description. 

The tool SMART-SHELL basically consists of three main modules: 
SMART-CONF, SMART-RUN and SMART-DATABASE which provide the user 
with a knowledge editor, a set of high level primitives and a data-base manager, 
respectively. The scheme of the system is reported in Fig. 3. 

The module SMART-CONF consists of a user-friendly interface, usable to 
describe both the background knowledge in input to the learner and other kind of 
knowledge (control knowledge) which has to be used by the learning strategies. 
Moreover, it contains a set of compilation procedures which translate this kind of 
knowledge in a more efficient form, internally used by the other two modules. 
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SMART-RUN basically implements the logic environment and the learning 
operators which will be described in the next sections, whereas SMART- 
DATABASE implements the low level procedures necessary for performing set 
intersection, natural join, selection and so on. SMART-RUN is embedded in a 
standard Common Lisp environment, whereas SMART-DATABASE is implemented 
in C language, for the sake of efficiency. 

The learning system is implemented on top of SMART-RUN and basically 
consists of a set of high level learning strategies which guide the application of the 
basic deductive and inductive operators. When such strategies are to be very 
sophisticated, it is a good practice to implement them as knowledge intensive 
procedures; to this aim the module SMART-CONF turns out to be useful again as a 
true expert system shell. 


Applications 


\ 



Fig. 3 - Scheme of the SMART-SHELL environment. 


KNOWLEDGE REPRESENTATION 

In the SMART-SHELL environment, a first order logic language L is used 
to describe in a unified form all the knowledge involved in the learning process: the 
background knowledge (given in input to the learner), and the concept descriptions 
(the learned knowledge). In addition, in order to facilitate the use of the system by the 
part of human experts, which may be not familiar with the abstract logic notation, 
SMART-SHELL offers a frame system, in the style of many standard expert system 
shells, which allows the knowledge to be also represented in an equivalent object-like 




paradigm. A compiler automatically performs the conversion from the frame-format 
representation to the logical one. The frame system can also be used as a tool for 
implementing the learner itself, as it has been done for ML-SMART [43]. 

The logic language L is a Horn clause language, extended with functors, 
negation and quantifiers. In particular, a well formed formula (wff) of L takes the 
form: 

9( s l* s 2* s n> l k> (0 

where p is a predicate belonging to a predicate set P, the terms tj,t2 t^ and 

si,S2 s n can be variables, constants or functions and <p is a logical expression 

built up using predicates in the set P, the connectives a and and the quantifiers 
ATM, ATL and EX. These quantifiers stand for ATMost, ATLeast and EXactly, 
respectively, and can be considered as an extension of the standard existential 
quantifier (similar to the numeric quantifiers used in the system INDUCE [44]). 
Fuzzy quantifiers are a very important extension to logical languages and have been 
proposed and deeply analyzed by Zadeh [49]. More precisely, let \|/(xj,x 2 ,...,x m ) be a 
logical expression built up using only the connectives a and -i; then, the expression: 

ATL n <yi,y2>...,yk> [V( x l> x 2-- x m)J <(yi.Y2 Yk) C {x lt X2,...,x m }) (2) 

is true of a given example f iff there exist at least n different bindings, between the 
variables variables yi,y2»-..,yk an d the objects occurring in f, satisfying \|/. In an 

analogous way, ATM n <yi,y2 yk> ty( x l’ x 2 x m )] and EX n <yi,y2 Yk> 

[\|f(xj,X 2 x m )] require at most n and exactly n different bindings in order to be 

satisfied. Notice that, for n = 1, the quantifier ATL corresponds to the existential 
quantifier 3, whereas, for n = 0, the quantifier ATM n corresponds to -i3. Quantifiers 
can be nested according to the usual rules of the predicate calculus. For instance, the 
expression : 

ATL 1 <x> [EX 2 <y> [Triangle(x) a Circle(y)]] (3) 

is an example of a wff of the language L (provided that the predicates Triangle and 
Circle belong to P). 

However, some structural restrictions are imposed on the formulas of L. In 
particular, the set of basic predicates P is divided into two disjoint subsets, p(°) and 
P("). The set p(°) contains predicates whose extension is evaluable, on the learning 
set, by means of queries to the data-base manager; examples of predicates belonging 
to p(°) are the ones reported in Fig. 2(a). By contrast, predicates in p( n ) are defined 
by means of implication rules such as (1). We recall that a predicate is evaluable 
when the data-base manager has a procedure for computing, through a selection 
operation, its extension from a given relation [47]; examples of standard evaluable 
predicates are the arithmetic predicates >,< and =. According to the definition given in 
[26], predicates in p(°) will be said operational and predicates in p(n) 

non- 

operational. 

Given the set of concepts Hq = {hj,h 2 h n ) , each concept hj corresponds 

to a non-operational predicate, which is true of f iff f is an instance of hj. Concepts 
descriptions are wffs of the type: 

<P (0) -» hj (hje H 0 ) (4) 

where q>(°) is a conjunctive wff containing only operational predicates. As, in 
general, more than one formula (4) is needed to completely define hj, all these 
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formulas are considered implicitly OR-ed. In the current implementation, the 
following restrictions on wffs are set: 

• Predicates occurring within the scope of a quantifier must belong to the set p(°). 

• Negation ( -i ) can be used only in formulas of the following format: 

9(x lf x2,...,yi,y2.-.ym) A -• V(yi»y2»-»ym) “> p (-) (5) 

Expression (5) states that variables occurring in a negated predicate must also occur 
in a non-negated one in the same formula. In this way, a simple extension of the 
SLD-resolution can be used as inference engine. 

In order to cope with the vagueness invariably associated with real-world 
applications, a continuous-valued semantics has been associated to the language L. 
Each formula <p(xi,..,x n ) e L has a corresponding truth degree ji e [0,1], computed 
by combining the truth degrees of the predicates occurring in cp. For this reason, the 
relation cp*, associated to the formula q>, has been extended (with respect to the 
format described in Fig. 2), by adding a new field M, containing the truth value \i of 
<p(xi,..,x n ) when xi,....,x n are bound to the objects specified in the corresponding 
tuple. 

The semantics of an operational predicate can be defined in two ways: 
extensionally, by giving the corresponding relation on the data-base, or intensionally, 
by defining a function on attribute values. This two specification forms can both be 
used in the system. In particular, the implicit form is more compact and efficient but 
needs an analytic definition, whereas the explicit form can always be given by simply 
filling up a table when an analytic expression is not available. An example of 
extensional semantic definition is given in Fig. 4. 



IN (xl, x2) 

(Object xl is inside object x2) 


Fig. 4 - Extensional definition of the predicate IN(xj,y 2 ). 

Furthermore, to ease the writing of semantic functions, the learning events Fq are 
usually described by means of a set of numerical and categorical attributes ai,...,a n ; 
to this aim, a new type of relation has been introduced in SMART-SHELL: the 
attribute values can be all collected into a unique (n+3)-ary relation, called OBJ. The 
fields F, H and X contain the identifier / of an event, the classification h of / and the 
identifier x of a part of /, respectively, whereas the other n columns store the values 
of the defined attributes for the object x. An an examples, the relation OBJ for the set 
of instances in Fig. 1 is reported in Fig. 5. 
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The semantic evaluation of a predicate p, depending on the values of the attributes 
aj 32 .-..^, can be obtained by computing the value of a function: 

s(ai^2-— a k) : A] x A 2 x ... xA^ [0,1] (6) 

where Aj is the domain of the attribute a j. A library of primitive functions has been 
defined to this aim. For instance, the semantics of a Boolean predicate, such as 
Triangle(x), can be specified as follows: 

if shape(x) = triangle then p=l else p=0. 

Analogously, the continuous-valued semantics of the predicate small(x) can be 
assigned as a membership function of the object x in the fuzzy set "small". This can 
be done according to the following syntax: 

fuzzy(0,10,30,40,area(x)) (7) 

Expression (7) states that the fuzzy set "small" has been defined over the base 
variable area(x) and has a trapezoidal shape, specified by the four values, 0, 10, 30 
and 40 , expressed in same suitable measure units. The corresponding fuzzy set is 
reported in Fig. 6. What is interesting, in SMART-SHELL, is that the user can give 
a default semantic for a fuzzy set definition; then the system itself, by analyzing the 
available examples, can adjust this definition or even learn it from scratch. This 
facility eases the burden of the domain expert in precisely defining the meaning of the 
terms he/she uses. 



Fig. 6 - Fuzzy set defining the semantics of predicate "small(x)". 
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Given the truth values u and v of two wffs <p and \|f of L, the semantics of q>A y and 
of <pvy is computed according to a pair of corresponding t-norm a(u,v) and t-conorm 
p(u,v); this evaluation reduces to the classical two-valued one in the case of Boolean 
predicates. 

The evaluation of a formula containing a negated predicate (as in formula 
(5)) is performed by evaluating the function a(u,l-v), where u is the evidence of 
9(x],X2,...,yi,y2»-»ym) v the evidence of V(yi»y2»-»y m )* For what concerns 
the fuzzy quantifiers, a semantics of the type proposed by Yager in [48] has been 
adopted. Let us consider the formula : 

<p(x - y ) = ATL m < y > [y* ( y ) a y M ( x - y )] (8) 

For the sake of simplicity, the vectors X and y denote, in (8), sets of variables. 
Given an example /, let bi,...,b r be the different bindings between the variables in y 
and the objects in / such that \|f ( y ) is true on /. Let, moreover, Mj(V') be the 
evaluation of \|f for the binding bj (l<j<r). Let us now sort the pj's in a non- 
increasing order : 

P! > p 2 ^ - * Mr 

Then, the evidence of the quantified formula (8) is computed as follows : 

p(a(m,n 2 . ... ,HmXhn+l • ••• >Hr) if m < r a y = x 

H(<P(X)) = a(P(a(m ,H 2 . ... 1 . ••• .Hr)*H(V")) if m £ r a y c X (9) 

0 otherwise 

The evaluation of the other two fuzzy quantifiers can be derived from the following 
relationships : 

ATM m < y > [\|/( X )] = -. ATL (m+1) < y > X )] (10) 

EX m < y > [\|/( X )] = ATL m < y > [(p( x )] a ATM m < y > [y( x )] (11) 

For efficiency reasons, the procedures for evaluating the predicate semantics and for 
updating the evidence of the formulas are handled by the data-base manager 
program. The truth evaluation of a non-operational predicate activates a deductive 
procedure which builds a corresponding operational formula, evaluable as described 
above. 


THE LEARNING OPERATORS 

The description of the learning methodology is out of the scope of this 
paper; we will focus, instead, on the mechanisms used to handle the fuzzy relations 
associated to the formulas generated during the search. To understand how these 
mechanisms work, it is sufficient to know what basic specialization and 
generalization operators can be applied to candidate formulas. The system SMART- 
SHELL provides the basic primitives necessary to search for concept descriptions in 
the space of formulas belonging to the language L. In particular it provides inductive 
operators, namely, specialization and generalization operators, and a deductive 
operator. 
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The inductive operators 

The basic operations available in the inductive part of the system are specialization 
and generalization . Each one of them can be performed by applying different 
operators, as described in the following. 

Specialization operators 

Specialization bv detailing. Given a formula 9(x 1 ,X2,...,yi,y2>-..,y n )> one way of 
obtaining from it a more specific formula y is by adding to it a predicate containing 
a subsets of the variables occurring in cp: 

v( x i» x 2 yi>y2 yn) = <p( x i> x 2 yi>y2 yn) A p(yi>y2 yn) 

In this way, the original description is enriched with some new details on the same 
objects considered before. Given the relation cp*, the extension y* of y is built up 
by selecting from cp* those tuples satisfying the predicate p(yj,...,y n ). The relation 
y* will have the same number of columns as <p*. An example is given in Fig. 7. 


cp(xl,x2)sTriangle(xl) ALarge(xl) ACircle(x2) y(xl,x2)=Triangle(xl) ALarge(xl) a 



ACircle(x2) AOn(xl,x2) 


y 


On(xl,x2) 


1 F 

M 

xl 

x 2 

1 

] 

8(3 

8:8 

0.6 

0.6 

a 

a 

a 

j 

1 

d 

k 


Fig. 7 - Example of specialization by detailing. 

Specialization by negation. Let 9 (x 1 ,...,x k ,y 1 ,...,y n ) and p(x!,...,x k ) be two 
formulas. Then, the new formula 

v(*i xk.yi yn) = <p(*i x k>yi yn) A ->p(xi,...,x k ) 

is obtained by negating the assertion p for the objects bound to <xj,...,x k > in cp. 
This operator is based on the negation as failure paradigm : given the extensions 9 * 
and p*, the resulting relation is obtained from 9 * by removing those tuples which do 
not verify p. An example is given in Fig. 8 . 

y(xl) =Circle(xl) A-iClear(xl) 


-» Clear(xl) 
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Fig. 8 - Example of specialization by negation. 
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Specialization bv conjunction . Let 9(xj,...,xj c ) and y(yi,...,y n ) be two formulas; 
the formula p(xi,...,xfc,yi,...,y n ) = q^xi,...^) a y(yi»...,yn) is more specific than 
both cp and y. A natural join is performed between the two relations 9* and \|/*. The 
resulting relation, an example of which is reported in Fig. 9 , will have k+n +2 
columns. 


<p(xl) =Triangle(xl) ALarge(xl) 



Fig. 9 - Example of specialization by conjunction. 

Specialization bv quantification . Let qKxi,...^,...^) be a formula; we can build 
up the quantified expression 9 = q n<xj,X2,..., x [q>(xi,...,Xi ....x^)], where 
qe {EX,ATM,ATL}. The formula 9 is closed with respect to the variables 
<X],X2,..., xj> and, then, the tuples in the relation y* contain only the variables 
<x i + l,...,xk>. While the quantifiers ATL and EX are implemented as data-base 
operators, ATM is handled by a higher level procedure, that uses both the 
quantification operators implemented and a set-difference operator described in the 
following. 


<p (x 1 ,x2) = Triangle(x 1) ALarge(xl) a 
ACircle(x2) AOn(xl,x2) 
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\p(xl) =ATL2<x2> [Obj(x2) ACircle(x2) aOii(x1,x2)]a 
ATriangle(xl) ALarge(xl) 
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Fig. 10 - Example of specialization by quantifying. 

Quantification is similar to a counting operator: for each example in a relation, it 
counts the number of tuples having different bindings to the variables <xj,...,xj> 
which exist in that relation; then, it projects over <xj + i,...,xk> the relation, 
discarding those tuples whose count is not q n. An example of this operation is 
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reported in Fig. 10. According to this procedure, the result of the ATM quantifier is 
neither a more specific nor a more general formula; in fact, the resulting extension is 
not comparable with the original one (it is not always possible to say which is a 
subset of the other one). 

Generalization operators 

Only one basic generalization operator has been considered, i.e., the one that 
performs a disjunction of two formulas having the same number of variables. Let 
<p(xi,...,xk) and y(xi,...,xk) be two formulas with the same number of variables; the 
formula p(x|,...,x^) = \|f(x|,...,x^) v q>(xj,...,Xfc) is a generalization of both. A 
merging operator, similar to the union operator of relational algebra, is used for 
implementing this operation. In Fig. 1 1 an example is reported. 


<p(xl) sTriangle(xl) ASmall(xl) 



Fig. 11 - Example of generalization by disjunction. 


Using the described basic operators, any kind of formula in the relational calculus, 
extended with the above defined non standard quantifiers, can be built up. 


Basic mechanisms for deduction 

As mentioned in Section 3, SMART-SHELL allows the background knowledge to 
be described as a Horn clause theory, in which the non-operational predicates p( n ) can 
be defined by means of implication rules (1). We will now briefly describe how the 
standard SLD-resolution mechanism can be extended in order to perform deduction 
using all the learning examples belonging to Fq at the same time. Let us consider a 
goal expressed in conjunctive form: 

g = pj(°) a P2^ a ... a pj( n ) a P2^ a ... 


Pl (n)<- qj(°) a q2^ a qj( n ) 


and the implication rule: 


(12) 
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Variables are omitted for the sake of simplicity. By applying the basic step of Horn 
clause resolution, the goal g can be reduced to die goal g': 

g' = pj(°) a P2^°) a ... a qj( 0 ) a q2^°) a qjO 1 ) a P2^ a ... 

where the non-operational predicate pj( n ) has been replaced by the body of the rule 
(13), after applying the unification with the terms occurring in g. However, suppose 
we know the extension g* on Fq of the operational subformula p^°) a P2 (o) A .. . in 
the goal g; then, the extension g'* can be easily computed by specializing g* with 
the formula qj( 0 ) a q 2 ^°\ i.e., by using the specialization operators defined in the 
previous section. 

This basic deductive step corresponds to the one used in deductive data-bases, 
which utilize the method of "queries and subqueries" [50]; in particular, SMART- 
SHELL incorporates a deductive data-base of this form, which has been obtained by 
extending Robinson's LOGLISP [51]. 

Inductive specialization and deductive steps can be also easily interleaved, 
realizing an effective integration of analytical and empirical learning [27]: in this 
framework, specialization steps allow one to modify the partial operational 
descriptions obtained from the theory, thus improving their classification 
performance. On the other hand, the deductive use of background knowledge supplies 
a skeleton for the inductive process, limits the search space and gives structural 
meaning to the obtained concept descriptions. 

Finally, as the semantics of the predicates can be freely defined by the user, 
he is also allowed to change it dynamically, in the sense that the shell provides a 
mechanism to firstly define non-operational predicates through a set of Horn clauses 
and then move them to an operational state, by deducing their operational form in a 
context free environment. In this case the predicates' semantics is given 
extensionally, by means of the relations built up during the former process. 


CONCLUSIONS 

In this paper we have described the tool SMART-SHELL, designed to ease 
the development of learning systems oriented to classification and diagnostic expert 
systems. The learning framework is based on an integrated paradigm allowing 
empirical learning (i.e. induction) and analytic learning (i.e explanation-based 
learning) to be the interleaved. This paradigm, that proved very effective in practice, 
can be easily implemented using a deductive data-base. Then, the environment 
SMART-SHELL can be considered as a special-purpose deductive data-base, extended 
in order to support the development of knowledge-intensive learners. An important 
feature of the system is the capability of handling fuzzy relations. 

So far, SMART-SHELL has been used to develop four families of learners, 
the best known being ML-SMART; they have been applied in several real-world 
domains, such as pattern recognition [43] and fault diagnosis of electromechanical 
equipments [45] among others. In these applications SMART-SHELL proved to be 
reliable enough and usable even by peoples who did not participate to the 
implementation of the tool itself (one version of ML-SMART has been develop by 
the SOGESTA s.p.a). The help obtained for speeding up the prototyping time has 
been evaluated excellent, when used by a trained programmer; several prototypes have 
been developed in few weeks. 

The facility of handling fuzzy logic was also a key for the success, 
especially in diagnostic problems, where coping with the vagueness of the terms used 
by a human expert and with the approximation of the measures cues was a must. 
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Moreover, the possibility of automatically acquiring the required fuzzy set definitions 
greatly enhance the system's usefulness. 
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1. INTRODUCTION 

1.1 General Knowledge and evidence 

An expert’s knowledge of an application is concerned with general tendencies, what 
is likely to be the case, frequent conjunctions, rules of thumb and other forms of 
statistical statements. An investigator may know that a certain type of crime is 
common among criminals of a certain type, an insurance company may know that a 
person with certain characteristics is a good risk, a doctor knows that certain 
symptoms almost always means the person is suffering from a given disease. The 
conclusion in each of these cases comes from studying tendencies in a population of 
relevant cases and using these to infer something about an individual case. 

Rules of thumb such as 

“most tall persons wear large shoes” 
can be expressed as the rule 

person X wears large shoes IF person X is tall : very likely 
X is a variable which can be instantiated to any member assumed to be drawn at 
random from the population of persons. This says that the head of the rule is very 
likely given the body of the rule is true. It makes a vague statement about the 
conditional probability Pr(large shoes I tall). This probability represents the 
proportion of persons who wear large shoes in some population of tall persons. It is a 
statement about the population as a whole rather than a statement about any 
particular individual person. If an individual person is known to be tall then one can 
infer that the probability of this person wearing large shoes is very likely. In this 
sense the variable in the rule is universally quantified, 
i.e Vx Pr(x wears large shoes I x is tall) = very likely. 
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The use of “very likely” rather than a point probability value further complicates 
matters. As a first approximation we might equate “very likely” with the interval 
[0.9, 1]. This means that the Pr(x wears large shoes I x is tall) lies in the interval [0.9, 
1]. We could further express this as the necessary support in favour of (x wears large 
shoes I x is tall) is 0.9 and the necessary support in favour of (x does not wear large 
shoes I x is tall) is 0. The term necessary support can be replaced with the term 
“belief”. We can also express this in the form of a mass assignment over the power 
set of 

{(x wears large shoes I x is tall), (x does not wear large shoes I x is tall)} 
namely, 

(x wears large shoes I x is tall) : 0.9 
(x does not wear large shoes I x is tall) : 0 

{(x wears large shoes I x is tall), (x does not wear large shoes I x is tall)} : 0.1 
where an assignment of mass m to set Y means m is the probability associated with 
exactly Y but not to any subset of Y. The meaning of these various terms will be 
expanded upon later in the paper. In order to more adequately capture the true 
semantics of the vague statement “very likely” we require to model this linguistic 
term using a fuzzy set, [ZADEH 1965} 

1.2 An Example 

Consider the following simple example. A bag contains 70% red balls and 30% blue 
balls. Each ball is either large or small. 60% of the red balls are large and 40% of the 
blue balls are large. 

Problem la 

What is the probability that a ball drawn randomly from the bag is large? 

Of course this is a very elementary problem and can be solved by fusing the pieces 
of information concerning the balls in the bag to calculate this probability. If [yl, y2, 
y3, y4} stand for the probabilities [Pr(rl), Pr(rs), Pr(bl), Pr(bs)} respectively and r, 1 
signifies “red”, “large” respectively then 

yl + y2 = 0.7 ; y3 + y4 = 0.3 

yl / (yl + y2) = 0.6 ; y3 / (y3 + y4) = 0.4 

so that yl = 0.42, yl - 0.28, y3 = 0.12 and y4 = 0.18 

from which Pr(l) = yl + y3 = 0.54. 

This is simply a probability logic problem. In the sequel this fusion of probabilistic 
information will be done by means of a general assignment method. 

Problem lb 

A ball drawn at random from the population is known to be large. What is the 
probability that it is red? 

The solution is given by yl / (yl + y3) = 0.7778 and comes from fusing the given 
information using probability logic and calculating the required conditional 
probability. 
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Of course yl / (y 1 + y3) = Pr(l I r)Pr(r) / Pr(l) which is Bayes theorem applied to this 
problem. This can therefore be viewed as an updating problem in which the apriori 
distribution {yl, y2, y3, y4) is updated using the certain information that the ball in 
question is large. 

Problem 2 

The balls in the bag are shown, as a black and white image on a screen, one by one to 
an observer. The observer is then asked if the third ball shown was red. The observer 
believes that the third ball was large but is not certain of this fact. He expresses this 
belief as Pr(third ball shown is large) = 0.8. He does not have information about the 
colours of the balls shown. What should be his belief that it is red? 

One possible answer to this problem is obtained by using Jeffrey’s rule, [JEFFREY 
1967] namely 

Pr(third ball is r) = Pr(r I l)Pr(third ball is 1) + Pr(r I s)Pr(third ball is s) 

= 0.7778 * 0.8 + {y2 / (y2 + y4)} * 0.2 
= 0.7778 * 0.8 + 0.6087 * 0.2 = 0.744 
It looks as if we have used the theorem of total probabilities, namely, 

Pr(third ball is r) = Pr(third ball is r Ithird ball is l)Pr(third ball is 1) 

+ Pr(third ball is red I third ball is small)Pr(third ball 
is s) 

with the assumption that 

Pr(third ball is r Ithird ball is 1) = Pr(r 1 1) and 

Pr(third ball is r I third ball is s) = Pr(r I s). 

We do not have to make this assumption if the following philosophy is accepted. The 
apriori distribution over the labels {rl, rs, bl, bs) is {yl, y2, y3, y4) . This is to be 
updated using the specific information P’r(l) = 0.8, where die * is used to signify that 
this is not the proportion of large balls in the population but a belief in one particular 
ball being large. We could update to {y’l, y’2, y’3, y’4) by choosing the {y’i} such 
that the relative information 

I=£y’iLn(y’i/yi) 

is minimised. This will be discussed further later. This forms the basis of the iterative 
assignment method to be discussed in detail in a later section. 

iJt Rittf. Form of Knowledge Representation 

Various forms of knowledge representation can be used to express the general 
tendencies and specific information discussed above. The assignment methods 
mentioned above can be used with various forms of knowledge representation but in 
this paper we will concentrate on a rule form like that used by Prolog but extended to 
allow for uncertainties of both a probabilistic and a fuzzy kind to be expressed. The 
methods developed in this paper are extensions of those used in the AI language 
FRIL, [BALDWIN 1986, 1987] and [BALDWIN et al 1987]. 
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Consider the following Prolog program. 

married(X) middle_aged(X), has_children(X). 

middle_aged(mary). 

has_children(mary). 

This says that any middle aged person who has children is married and that Mary has 
children. We can conclude from this that Mary is married. In Prolog we ask the 
query 

?- married(mary) 

to which we get the reply 

yes. 

Suppose it is known that at least 70% and at most 90% of middle aged persons who 
have children are married. We will define the fuzzy term “middle aged” using the 
fuzzy set 

middle.aged with membership function 

l/5.x - 7 for 35 < x < 40 

Xmiddle_aged< x)= lfor40 - x - 50 

-1/5.X+ 11 for 50 < x £ 55 
0 elsewhere 

We also define the fuzzy term “about_35” using the fuzzy set about_35 with 
membership function 

l/5.x - 6 for 30 < x < 35 
z about_35^ = ' 1/5x + 8 for 35 < x < 40 
0 elsewhere 

Suppose it is known that Mary is about 35 and it is believed with a probability of at 
least 0.8 that Mary has children. 

A more realistic program is 

married(X) age(X, middle_aged), has_children(X) : [0.7, 0.9]. 
age(mary, about_35) 
has_children(mary) : [0.8, 1]. 

which is interpreted as saying that the conditional probability that someone is 
married if the person is middle aged and has children lies between 0.7 and 0.9 and 
that the person Mary is about 35 years old and the probability that Mary has children 
lies between 0.8 and 1. 

We can now ask how do we answer the query 
?- married(mary) 

We would expect the answer to take the form of an interval containing the 
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probability that Mary is married. 

The methods of inference developed below will allow us to answer this query for this 
program. In deriving this interval both the probabilistic and fuzzy types of 
uncertainty must be taken into account. For example, the rule talks about middle 
aged persons while the age of Mary is given as “about 35”. From a syntactic point of 
view it would appear that the rule has no relevance to Mary but from a semantic 
point of view it does since someone who is “about 35” is to some degree middle 
aged. This degree depends on the definitions of the fuzzy sets “middle aged” and 
“about 35”. In order to answer the query given it is necessary to determine an 
interval containing the conditional probability Pr{age(mary, middle.aged) I 
age(mary I about_35). We term this process “semantic unification”, [BALDWIN 
1990a]. 

When the second argument of the age predicate is always a crisp set then this interval 
is 

[0,0], [1, 1] or [0, 1]. 

For example, 

Pr[age(mary, [40, 50]) I age(mary l[35, 39])} = 0 and 

Pr[age(mary, [35, 45]) I age(mary l[37, 42])} = 1 and 

Pr[age(mary, [40, 50]) I age(mary i [45, 55])} is contained in the interval [0, 1]. 

Semantic unification extends this to the case of fuzzy sets. 

A program can also contain universally quantified facts. For example 
middle_aged(X) : [0.4, 0.5] 

would be interpreted as saying that between 40% and 50% of the relevant population 

of objects being considered are middle aged. While X can be instantiated to the 

object mary, the statement 

middle_aged(X) : [0.4, 0.5] with X = mary 

is not interpreted in the same way as 

middle_aged(mary) : [0.4, 0.5]. 

The former is a statement about the population as a whole and if Mary is an object 
drawn at random from this population then Pr[middle_aged(mary)} lies in [0.4, 0.5]. 
The latter statement makes no reference to the population but is a statement made 
about the object Mary by inspecting this object independently of any population 
statistics. The first statement is concerned with a general tendency while the second 
statement is specific to the object concerned. 

1.4 aims of Paper 

In this paper we will discuss methods for answering queries of the types given above 
from a knowledge base expressed in rule form containing statements representing 
general tendencies and also specific facts. The specific facts expressed in 
probabilistic terms will be used as evidences to update the family of possible apriori 
distributions obtained from the relevant general statements and this update used to 
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provide the answer to the given query.The rules and facts can contain both 
probabilistic and fuzzy uncertainties. 

The inference method for processing the knowledge base to answer queries will use 
the following three methods 

(1) general assignment method 

(2) iterative assignment method 

(3) semantic unification. 

In special cases the inference process simplifies to using the theorem of total 
probabilities if only general statements from the same population are used or 
Jeffrey’s rule when both general statements and specific evidences are used. This is 
the inference mechanism of the AI language FRIL. 

Each of these methods requires the information in the form of a mass assignment 
over a frame of discernment whose elements are labels formed from the information. 
We discuss this more fully in the appropriate sections that follow. A general 
treatment will not be given here and the reader is expected to generalise for 
him/herself from those cases discussed. Other aspects can be found in [BALDWIN 
1990b, 1990c] 

2. MASS ASSIGNMENTS. SUPPORT MEASURES AND SUPPORT PAIRS 
2.1 Labels and frame of discernment 

A knowledge base statement, either in the form of a rule or a fact, is converted into 
the form of a mass assignment over a set of labels. Each label is a concatenation of 
instantiations of the proposition variables and the proposition variables come from 
all the information of the knowledge base relevant to answering the given query. The 
following example will illustrate this. A general theory in terms of inference 
diagrams and logic proof paths can be given but space does not allow this to be 
included here. 

Consider the knowledge base 
fly(X):-bird(X): [0.9, 0.95]. 
bird(X) penguin(X). 
fly(X) penguin(X) : [0, 0]. 
penguin(obj) : [0.4, 0.4]. 
bird(obj) : [0.9, 1]. 
etc 

and the query 
?- fly(obj) 

To answer this query we form the following inference diagram 
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[0.9, 1] 


[0.4, 0.4] 


from which we extract the propositional variables 
B = bird(obj) ; F = fly(obj) ; P = penguin(obj) 

Each of these variables can be instantiated to “true” or “false”. We represent the set 
of instantiations for a propositional variable X by [x, — oc}. A label is a possible 
instantiation of the concatenation of the three variables B, P and F written as BPF. 
Thus the possible set of labels corresponding to the frame of discernment is 
L = { — ib — p— if, — ib— pf, — ibp — if, — bpf, b— p— if, b-ipf, bp- if, bpf) . 

For any knowledge base consisting of facts and rules and for any query, a frame of 
discernment can be established using this method of constructing an inference 
diagram and extracting the propositional variables. The inference diagram is 
obtained by using the unification and backtracking mechanisms of Prolog with 
extensions to include semantic unification as discussed later. 

2.2 Mass Assignments 

A mass assignment over a finite frame of discernment X is a function 
m : P (X) -» [0, 1] where P(X) is the power set of X 
such that 
m(0) = 0 and 

Xm(A)=l 
A eP(X) 

and corresponds to the basic probability assignment function of the Shafer / 
Dempster theory of evidence, [SHAFER 1976]. m(A) represents a probability mass 
assigned exactly to A. It does not include any masses assigned to subsets of A. 

As an example consider the specific evidences 
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penguin(obj) : [0.4, 0.4]. 
bird(obj) : [0.9, 1]. 
given above. 

The first is equivalent to the following mass assignment over the set of labels L 

m({-ibp-if, -ibpf, bp— if, bpf}) = 0.4 

m({ — b — p— if, — b-pf, b- p— if, b— ipf}) = 0.6 

which can be written as 

LpJ :0.4 

{_ — >P _} :0.6 

where _ can be instantiated to the appropriate proposition or its negation. 

Similarly the second evidence is equivalent to the mass assignment 
[b_ J : 0.9 
0.1 

Consider the general statements 
fly(X) :-bird(X): [0.9, 0.95]. 
bird(X) penguin(X). 
fly(X) penguin(X) : [0, 0]. 

The second of these clauses say that the labels {-ibpf, — ibp — if] are not possible. The 
third says that [bpf] are not possible. The combined two statements says that the 
labels {—ibpf, — ibp-if, bpf] are not possible. We can therefore express the first clause 
as a mass assignment over the reduced set of labels 

L’= ( — ib — ip — if, — ib— ipf, b— ip- if, b-ipf, bp— if), by combining the following two 
evidences, each expressed as a mass assignment over L’ 

(1) [b— ip _ , bp— if] : k , { — ib— ip _ } : 1-k 

(2) [b— ipf ) ' 0.9k , {— ib— p _ , b _ —if] ; l-0.9k 
corresponding to 

Pr[bird(obj)} = k , Pr{-, bird(obj)} = 1-k 
and 

Pr{bird(obj) a fly(obj)} =0.9k Pr[-i (bird(obj) a fly(obj))} = l-0.9k 
The combination of these two evidences, using the general assignment method 
defined below gives the mass assignment over L’ as 
[b-ipf] : 0.9k 

{ — ib— ip } ; 1-k 

[b_ -if] :0.1k 

for the combined relevant general statements in the knowledge base. The conditional 
statements of rules can always be treated in this way. Pure logic rules simple reduce 
the set of possible labels. 

23 Support Pairs 

We use the concept of belief and plausibility measures of [SHAFER 1976] to define 
necessary support and possible support measures. Names are changed to be 
consistent with the notation used in support logic programming, [BALDWIN 1986] 
and the FRIL language [Baldwin et al 1987], and to avoid confusion with 
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conclusions and derived results based on the use of the Dempster rule of combining 
evidences. The methods given here do not use the Dempster rule and the necessary 
and possible supports are more in keeping with upper and lower probabilities, 
[DUBOIS, PRADE 1986]. 

A necessary support measure is a function 
Sn : P (X) [0, 1] 

where X is a set of labels and P (X) is the power set of X 
that satisfies the following axioms 


Axiom 1 (boundary condition). Sn(0) = 0 and Sn(X) = 1 where 0 is the empty set 
Axiom 2 : Sn(AjU A^ u ... u Ajj) > Zj Sn(A-) - 1-^ Sn(A- n Aj) 

+ ... + (-l) n+1 Sn(Aj n A 2 n ... n A n ) 

for every collection of subsets of X. 

For each A zP (X), Sn(A) is interpreted as the necessary support, based on available 
evidence, that a given label of X belongs to the set A of labels. 

When the sets Aj, •••» axiom 2 are pairwise disjoint ie. 

(A. n Aj) = 0 for all i, j e (1, 2, ..., n) such that i * j 

the axiom requires that the necessary support associated with the union of the sets is 
not smaller than the sum of the necessary supports pertaining to the individual sets. 
The basic axiom of necessary support measures is thus a weaker version of the 
additivity axiom of probability theory. 

It is easy to show that axiom 2 above implies that for every A, B e P (X), if A C B, 
then Sn(A) < Sn(B) 
and also that_ 

Sn(A) + Sn(A) < 1 

Possible Support Measure 

Associated with each necessary support measure is a possible support measure Sp, 
defined by the equation 
Sp(A) = 1 - Snl(A) 
forall AeP(X). 

Similarly _ 

Sn(A) = 1 - Sp(A) 

Necessary support measures and possible support measures are therefore mutually 
dual and it iseasy to show that 
Sp(A) + Sp(A) > 1 

Given a basic probability assignment m, a necessary support measure and possible 
support measure are uniquely determined by the formulae 



306 


Sn(A) = I m ( B ) 

BcA 

and 

Sp(A) = £ m(B) 

AnB^0 

which are applicable for all A e P (X). 

Focal Elements 

Every set A e P (X) for which m(A) > 0 is called a focal element of m. We can 
represent the mass assignment as (m, F) where F is the set of focal elements. 

Total ignorance is expressed in terms of the mass assignment by 
m(X) = 1 and m(A) = 0 for all A * X. 

Using the formula above for Snl in toms of m, we can therefore also express total 
ignorance as 

Sn (X) = 1 and Sp(A) = 0 for all A * X 

The total ignorance in terms of the possible support measure is 

Sp(0) = 0 and Sp(A) = 1 for all A * 0. 

A support pair for A e P (X) is given by [MIN Sn(A), MAX Sp(A)] and this 
defines an interval containing the Pr(A) where the MIN and MAX are over the set of 
values of any possible parameters that Sn(A) and Sp(A) may depend on. This will be 
illustrated later. 

3 . GENERAL ASSIGNMENT METHOD 
3.1 COMBINING MASS ASSIGNMENTS 

Let ml and m2 be two mass assignments over the power set P (X) where X is a set 
of labels. Evidence 1 and evidence 2 are denoted by (ml, FI) and (m2, F2) 
respectively, where FI and F2 are the sets of focal elements of P (X) for ml and m2 
respectively. 

Suppose FI = (Llk) for k = 1, .... nl and F2 = {L2k} for k = 1, .... n2 
then Lij is a subset of P (X) for which mi(Lij) * 0. 

Let (m F) be the evidence resulting from combining evidence 1 with evidence 2 
using the general assignment method. This is denoted as 
(m, F) = (ml, FI) © (m2, F2) 
where 

F = {Lli n L2j I m(Lli nL2j) * 0} 



307 


m(Y) = ^m’(LlinL2j) 

ij : Lli nL2j = Y for “y Y eF 

m’(Lli nL2j) for i = 1, .... nl ; j = 1, .... n2 satisfies 

^ m’(Lli nL2j) = ml(Lli) 
j for i = 1, .... nl 

y,m’(Lli nL2j) = m(L2j) 
i forj= 1, ...,n2 

m’(Lli nL2j) = 0 if Lli n L2j = 0 the empty set ; for i = 1,..., nl ; j = 1, .... n2 


The problem of determining the mass assignment m is an assignment problem as 
depicted in the following diagram 

If there are more than two evidences to combine then they are combined two at a 
time. For example to combine (ml, FI), (m2, F2), (m3, F3) and (m4, F4) use 
(m, F) = (((ml, FI) © (m2, F2)) © (m3, F3)) © (m4, F4) 


m(L2j) 



The labels in a cell is the intersection of the subset of labels of evidence 1 associated 
with the row of the cell and the subset of labels of evidence 2 associated with the 
column of the cell. 
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The mass assignment entry in a cell is 0 if the intersection of the subset of labels of 
evidence 2 associated with the column of the cell and the subset of labels of evidence 
1 associated with the row of the cell is empty. 

The mass assignment in a cell is associated with the subset of labels in the cell. 

The sum of the cell mass assignment entries in a row must equal the mass 
assignment associated with ml in that row. 

The sum of the cell mass assignment entries in a column must equal the mass 
assignment associated with m2 in that column. 

If there are no loops, where a loop is formed by a movement from a non zero 
assignment cell to other non zero assignment cells by alternative vertical and 
horizontal moves returning to the starting point, then the general assignment problem 
gives a unique solution for the mass assignment cell entries. If a loop exists then it is 
possible to add and subtract a quantity from the assignment values around the loop 
without violating the row and column constraints and the solution will not then be 
unique. If a non-unique solution exists then the family of solutions can be 
parametrised with known constraints on the parameter values. These possible 
parameter values must be taken into account when determining support pairs from 
the necessary and possible support measures. 

32 TheBird Penguin Example revisited 

Consider the example discussed above of combining the two evidences 

(1) {b— ip _ , bp— if} : k , { — ib-.p _ } 1-k 

(2) {b — ipf} : 0.9k , {-nb— .p _ , b _ — .f } : l-0.9k 
Using the general assignment method we obtain 

0.9k l-0.9k 



{b— i pf} 

{—i b~t p _ , b _ -~i f } 


{b-i pf} 

0>_->f} 

k 

{b— ! p _ , bp— i f} 

0.9k 

0.1k 


0 

{— ’ b— i p _ } 

1-k 

{ — ■ b— i p _ } 

0 

1-k 
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giving the result quoted previously. 

Consider combining the specific evidences expressed as mass assignments over the 
set 

X = {bp, b-ip, -ibp, —ib — .p} 
in the example above, namely 
penguin(obj) : [0.4, 0.4]. 
bird(obj) : [0.9, 1]. 

We have 

(1) f_P ) : 0.4 , {_ — «p } : 0.6 

(2) {b_}: 0.9 0.1 

Using the general assignment method we obtain 

0.9 0.1 

b [b, -ib) 


0.4 

P 


0.6 

-P 


where 0 £ x < 0.1 

An abbreviated form of labelling is used for convenience. 

The necessary and possible supports for the various elements of X are given by 

Sn(bp) = 0.4 - x ; Sp(bp) = 0.4 

Sn(b-.p) = 0.5 + x ; Sp(b-.p) = 0.6 

Sn(-ibp) = 0 ; Sp(-ibp) = x 

Sn(-ib-ip) = 0 ; Sp(-ib-ip) = 0.1 - x 

from which we can calculate the support pairs 

bp : [0.3, 0.4] ; b-np : [0.5, 0.6] ; -,bp : [0, 0.1] ; -,b-ip : [0, 0.1] 

3.3 ANOTHER EXAMPLE 

Consider the example: 


90% of birds can fly (1) 

No penguins can fly (2) 

All penguins are birds (3) 


bp 

0.4 - x 

Lp) 

X 

b-.p 
0.5 + x 

L-P) 
0.1 -x 
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70% of objects exhibited are birds (4) 

5% of objects exhibited are penguins (5) 

which in program form would be 


fly(X) bird(X) : [0.9, 0.9]. 
fly(X) penguin(X) : [0, 0]. 
bird(X) penguin(X) : [1, 1]. 
bird(X) : [0.7, 0.7]. 
penguin(X) : [0.5, 0.5]. 

We can ask the following question. What is the probability that an object drawn at 
random can fly. 

To answer the query we combine the following mass assignments over the label set 
L’ using the general assignment method. 

(1) {b-«p-if,b-«pf,bp-if) :0.7 ; {— > b— i p— • f, — « b— ■ pf) : 0.3 using(4) 

(2) {bp— i f) : 0.05 ; {— i b— < p— > f, — i bi pf, b— i p— i f, b— > pf) : 0.95 using (5) 

(3) {b — i pf] ; 0.9k ; {— i b— i p— i f, — > b— ■ pf, b— i p— i f, bp— ■ f] : 1 - 0.9k using (1) 
where k is the probability assigned to [b— • p-i f, b-i pf, bp-i f] 

For this particular example the solution is easily found by elementary analysis to be 
[b-i pf] : 0.63 
{bp— f): 0.05 

[b-, pi f): 0.02 

[i bi pf, i bi pi f) ! 0.3 

so that the answer to the query is that the Pr[(x can f)}lies in the interval [0.63, 0.93] 
since the 0.3 associated with [i bi pf, i bi pi f] could be all be associated with i 
bi pf although this is not necessarily the case. This conclusion is expressed in the 
form of a support pair. We can obtain this result using the general assignment as 
follows: 

Combining (1) and (2) gives 


0.05 Evidence2 0.95 



[bpif] 

{— «b-ip_ ,b-ip_ } 

0.7 [bi p _ , bpi f] 

(bpif) 

[bip_] 

Evidence 1 

0.05 

0.65 

0 

[ibip_] 

0.3 [i bi p _ } 

0 

0.3 


The mass assignment resulting from combining(l) and (2) is thus 
[bpi f] : 0.05 
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(b - 1 pf, b— > p— i f} : 0.65 

{—i b— i p f, — , b— , p~i f} ; 0.3 

This is now combined with (3) as follows 

0.9k Evidence3 1 - 0.9k 



b— > pf 

{ — « b— i p , b — » f} 

Evidence 1,2 

0 

{bp—, f) 

0.05 : {bp-, f} 

0 

0.05 


n^pf) 

{b— >p— i f) 

0.65: {b-,p_} 

0.9k 

0.65 -0.9k 


0 

{->b-,p_} 

0.3 1 {—i b-» p _ } 

0 

0.3 


where 0.9k + 0.05 + (0.65 - 0.9k) = k 

so that k = 0.7 giving the combined mass assignment 

{bp-, f } : 0.05 
{b — i pf) : 0.63 
{b— i p— , f) ; 0.02 

{— , b— , pf, — , b— i p— i f) ; 0.3 


4. ITERATIVE ASSIGNMENT METHOD 
4.1 IJpnATtNO Problem 

Suppose an apriori mass assignment m is given over the focal set A whose 

a 

elements are subsets of the power set P (X) where X is a set of labels. This 
assignment represents general tendencies and is derived from statistical 
considerations of some sample space or general rules applicable to such a space. 

Suppose we also have a set of specific evidences {El, E2 En) where for each i, 

Ei is (mi, Fi) where Fi is the set of focal elements of P (X) for Ei and mi is the mass 
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assignment for these focal elements. These evidences are assumed to be relevant to 
some object and derived by consideration of this object alone and not influenced by 
the sample space of objects from which the object came from. 

We wish to update the apriori assignment m a with {El, .... En) to give the updated 

mass assignment m such that the minimum information principle concerned with the 
relative information of m given m is satisfied. 

a 

4-2 ALGORITHM FOR SIMPLE CASE 

The minimum information principle 

Let p be an apriori distribution defined over the set of labels X. 

Let specific evidences El, .... En be given where Ei expresses a probability 
distribution over a partition of X. 

Let p’ be a distribution such that 

2 p’(x)Ln(p’(x)/p(x)) 
xeX 

is minimised subject to the constraints 
El, ...,En 

p’ is said to satisfy the minimum information principle for updating the distribution p 
over X with specific evidences El, ..., En where each Ei is expressed as a 
distribution over a partition of X. 

The sequential iterative assignment algorithm updates p using El to obtain the 
update pi satisfying the minimum information principle which is similarly updated 
using E2 to obtain p2 which ... in turn is updated using En to obtain pn. At each stage 
of this process only the evidence used for updating is necessarily satisfied and 
previous evidences used will no longer be necessarily satisfied. We therefore replace 
the apriori with pn and repeat the process. The iteration is continued until a pn is 
found which satisfies all the evidences El, ..., En. 

This iterative process in fact converges to the solution which satisfies the minimum 
information principle of minimising the relative information with respect to the 
apriori p subject to the constraints El, .... En. The multi constraint optimisation 
problem is therefore solved by a succession of single constraint optimisation 
problems and iterating. 

The single constraint optimisation problem has a particularly simple algorithm for its 
solution which we will now consider. 

Let p(r-l) = p, say, be updated to p(r) = p\ say, using Er with the following 
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algorithm which gives a p’ such that 

]jTp , (x).Ln{p’(x)/p(x)} 

xeX 

is minimised subject to the constraint Er being satisfied. 

Let the partition for evidence Er be {XI, .... Xk} with probability distribution 
{Pr(Xi» given. 


LetKi=]jjjp(x) 
x e Xi 

for i = 1, .... k 
then 

p’(x) = p(x)Pr(Xi) / Ki for x e X IF x e Xi for x a label of X, for all labels of X 

The algorithm is particularly simple when the apriori is expressed as a probability 
distribution over the set of labels and the evidences as a distribution over a partition 
of the set of labels. 

We can give a pictorial representation for this algorithm 



Pr(Xj) 

Xj ={..., li,...} 


pi : li 

li : Kj.pi.Pr(Xj) 

,0:0 

for all other cells 
in this row 


0:0 

if label in row is not in Xj 


43 Generalisat 

1 

Kj = 

X P k 

k : lk e Xk 

TON FOR Evidences expressed as 

; MASS ASSIGNMENTS 


If the evidences are expressed as mass assignments over X with the apriori 
assignment still being a probability distribution over the set of labels X then a mate 
complicated case must be considered. 
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Let the p(r-l) = p be denoted as in 4.2 and let Er be the mass assignment 
(Xrk : mrk, for k = 1, nr} 

where Xrk is a subset of X for all k and mik is the mass assignment given to Xrk. 

p’(x) = p(x).mrk. Kk ; for x any label, for all labels 
k : xeXrk 

where 

1 

Kk= 

2 ps 

s: Is eXrk 

We can express this in pictorial form as: 


Apriori 


pi : li 


mrk 

Xik 


li : Kk.pi.mrk 
= tk 


similarly for any 
other column k cells such 
that li e Xrk 


mrq 

Xrq 


li : Kq.pi.mrq 
= tq 


Update 


p’i = tk + tq 
plus any 
other cases 


Kk = 


Kq = 


1 


2 ps 

s : Is eXrk 


2 ps 

s : Is eXrq 


Each column constraint is satisfied. The labels in the tableau cells are all labels of X 
since the focal elements of Er when intersected with a label in the apriori gives the 
apriori label. The update of this label is the sum of all the cell assignments associated 
with this label. A cell in a row where the apriori label is not a member of the cell 
column focal element of Er has a zero mass assignment associated with the empty 
set. 

In this case the update solution p’ satisfies the following relative information 
optimisation problem. 


is minimised subject to the constraints 

Sn(Y) < p’(Y) < Sp(Y) for all subsets Y of P (X) 

where Sn(Y) and Sp(Y) are determined from the mass assignment (mr , Fr) 


4.4 Generalisation for both apriori and evidence s expressed as mass 

ASSIGNMENTS 


In this case the intersection of the row subset of labels of the apriori assignment with 
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the column subset of labels of the evidence assignment for a given cell of the tableau 
is a subset of P (X). In the case when this intersection is the empty set the mass 
assignment for that cell is zero. When the intersection is not empty then the mass 
assignment is the product of the row apriori assignment and the evidence column 
assignment scaled with the K multiplier for that column. The K multiplier for a 
column is the sum of the apriori row assignments corresponding to those cells of the 
tableau in the column which have non-empty label intersections. The update is a 
mass assignment over the set of subsets of labels of the cells of the tableau. The 
update can therefore be over a different set of subsets of labels to that of the apriori. 
Iteration proceeds as before and convergence will be obtained with a mass 
assignment over P (X) which will correspond to a family of possible probability 
distributions over X. 

We can give a pictorial view of the algorithm: 



The apriori assignment was also a family of possible probability distributions. For 
any one of these, the calculation is that of 4.3 and satisfies the minimum information 
principle where the constraints are in terms of necessary and possible supports from 
the evidence assignments. A member of the apriori set of possible probability 
distributions over the label set X will be updated with the evidences El, ..., Er to a 
final probability distribution over X satisfying the minimum information principle. 
Each member of the set of possible apriori probability distributions will be updated, 
in general to a different final probability distribution. The final set of distributions 
can be expressed as an assignment over P (X). This is what the algorithm described 
above does in this case. The calculation is no more involved as for the other more 
simple cases apart from having to determine the intersections for finding the subset 
of labels for each cell and taking note of these in the final update. 

The examples which follow will illustrate the method. 

In this case the updating tableau can contain loops in a similar manner to the general 
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assignment case. The loop can be treated in exactly the same way as for the general 
assignment method. This will be illustrated in the examples that follow. Each 
solution of the loop satisfies the minimum information principle. 

5. EXAMPLES OF USE OF ITERATIVE ASSIGNMENT METHOD 

S.1 A SIMPLE RULE SYSTEM 


a(X) b(X), c (X) : [0.9, 0.9], [0, 0]. (1) 

c(X) d(X) : [0,.85 1], [0, 0]. (2) 

b(mary) : [0.8, 0.8]. (3) 

d(mary): [0.95, 0.95]. (4) 


This is a simple FRIL type program. X is a variable and a, b, c, d are predicates. The 
Erst two sentences are rules which express general statements about persons and the 
third and fourth are facts about a specific person mary. 

If a rule contains one list after the colon then this gives the interval containing the 
probability of the head of the rule given the body of the rule is true. If the list after 
the colon contains two lists then the first gives the probability of the head of the rule 
given the body of the rule is true and the second gives the interval for the probability 
of the head of the rule given the body of the rule is false. 

The first rule says that for any person X the probability that (X is a) given that (X is 
b) and (X is c) is 0.9. This expresses the fact that 90% of persons who are both b and 
c are also a. It also says that (X is a) cannot be true unless both b and c are satisfied. 

The second rule says that at least 85% of persons who are d are also c while no 
person who is not a d can be a c. 

We can ask the query in FRIL 
?- a(mary). 

to determine the support pair for (Mary is a) 

In this example the rules are used to determine a family of apriori assignments over 
the two sets of labels 

[ABC] and [CD] 

where A, B, C, D denotes a or -i a, b or -i b, c or-i c, d or -> d respectively. Rule (2) 
can be used to construct a family of apriori assignments for [CD] which can be 
updated using (4) for the specific person Mary and from this a support pair for 
Pr(c(mary)} determined. This can be used with (3) to update a family of apriori 
assignments determined from (1) for the set of labels [ABC]. From this update the 
Pr[a(mary)} can be determined. 

Alternatively we could update the family of apriori assignments over the set of labels 
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{ABCD} constructed using rules (1) and (2) with specific evidences (3) and (4) and 
determine Pr{a(mary)} from the final update. 

These two approaches are equivalent and the first approach decomposes the problem 
of finding Pr{a(mary)} into two sub problems. Decomposition will not be discussed 
in detail in this paper but it is important to reduce the computational burden 
associated with updating over a large set of labels. 

The computation of the first approach is shown below 

Labels with apriori assignment 
yl cd 

y2 c-id 

y3 -icd 

y4 -ic-id 

rule (2) constrains {yi} such that 

yl/(yl + y3) = [0.85,1] and y2/(y2 + y4) = 0 

so that 

yl = [0.85, l]k where k = yl + y3 for 0 < k < 1 

y2 = 0 

Thus 

{ d} :k , {— i c— id) : 1-k — (5) 

[cd] :0.85k , hc_):l-k , [cd,-,c_ }: 0.15k (6) 

(5) and (6) can be combined using the general assignment method to form a family 
of apriori assignments over this set of labels 



0.85k 

1-k 

0.15k 


{cd} 

{— -c_ } 

[cd, c _ } 

k : {_ d) 

{cd} : 0.85k 

{-i cd} : 0+x 

{_ d} : 0.15k-x 

1-k ! {—i c — i d} 

0:0 

{-, c— i d} : 1-k-x 

{-ic-id} :0+x 

where 0 < x < MIN[0.15k, 1-k] 

giving apriori assignment 

[cd] : 0.85k , { — i cd) : x , { — i c — i d] : 1-k-x 

which can be updated using 

, [_d] : 0.15k-x 


{_ d) : 0.95 , {_ 

-id) :0.05 



by the iterative assignment method as follows 



0.95 

0.05 


Apriori 

{_d} 

L-d) 

Update 

0.85k : [cd] 

0.8075 

0 

0.8075 

x : [-1 cd] 

0.95x / k 

0 

0.95x / k 

1-k : { — » c — i d] 

0 

0.05 

0.05 

0.15k-x : [_d] 

0.1425 -(0.95x/k) 

0 

0.1425 -(0.95x/k) 

K’s 

1/k 

l/(l-k) 
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From this we calculate 

Pr{c(mary)} e [0.8075, MAX{0.95 - (0.95x /k)}] 

The upper limit is maximum for x = 0, so that 
Pr{c(mary)} e [0.8075, 0.95] 
which is to be used for the next stage. 

Labels and Apriori 
yl abc 

y2 -.abc 

y3 -.ab-iC {ab-. c, a-, be, a-, b-. c} : 0 from rule 1 

y4 -.a-.be 

y5 — .a-ib-iC 

Also yl / (yl + y2) = 0.9 

so that yl =0.9k where k = yl + y2 

ie we must combine 

1. [abc] : 0.9k , Ha__}:l-0.9k 

2. [_bc] :k , {—i ab— . c, — . a— . b _ } ; 1-k 

The general assignment method gives 

0.9k l-0.9k 

[abc] {-.a } 

k : {_ be) {abc} : 0.9k abc} : 0.1k 

1-k: {-.ab-iC, 0:0 . ab— > c, — . a-. b_ } : 1-k 

— . a— . b_) 

giving the apriori family of assignments over the labels [ABC] as 
0.9k [abc] 

0.1k {-. abc} 

1-k { — i ab — i c, — .a— ib_} 

which is updated using 

Specific Evidence 1 :- { _ b _ } : 0.8 , { } : 0.2 

Specific Evidence 2 :- { c) : 0.8075 , { — . c) : 0.05 , { } : 0.1425 


0.9k 

(abc) 

UPDATE USING 

{abc} 

0.1k 

{-.abc} 

Specific Evidence 1 

{-.abc} 

1-k 

{— . ab— . c, —i a— . b _ } 

TO GIVE UPDATE 

{— .ab— >c} 

{ — i a— . b _ } 


{abc} 

UPDATE USING 

{abc} 


{-.abc} 

Specific Evidence 2 

{-.abc} 


{ — i ab— i c} 

TO GIVE UPDATE 

(—1 ab— > c} 


{— . a— i b _ } 


{— . a—, be} 
{— . a— >b— i c} 
. a—, b _ } 
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[abc] 

UPDATE USING 

{abc} 


[-.abc] 

Specific Evidence 1 

{-.abc} 


{-iab-ic} 

TO GIVE UPDATE 

{—i ab— i c} 


[~ i a— i be] 


{— « a— i be} 


{- 1 a-i b-i c] 


{— i a—ib— i c} 

i 

{ — i a - 1 b _ } 


{— . a— . b_} 

T 




E 




R 




A 

[abc] 

UPDATE USING 

{abc} 

T 

[-iabc] 

Specific Evidence 2 

{-.abc} 

E 

[-iab-ic] 

TO GIVE UPDATE 

{ — > ab— » c} 


{-ia-ibc} 


{-.a-.be} 


{- 1 &—i b-i c] 


{— » a— i b— i c} 


{ — • ^ — i b _ } 


{ — i a — i b_ } 



For all values of k this will give an interval for {abc} and thus a. We leave the actual 
calculation to the reader. The result of this calculation is 
a: (0.6675 0.72) 

sothatPr{a(mary)) e [0.6675,0.72]. 

The interval for be can also be calculated and this is [0.6075, 0.8]. It should be noted 
that if only the answer for a(mary) is required the last 5 rows of the last two tables 
can be collapsed into one row with the assignment for this row equal to the sum of 
the assignments of the five rows. This simplifies the calculation process. 

In this example each stage of the process retains the information given by the 
appropriate rule. For example in the final table Pr[a(mary) I b(mary), c(mary)} = 0.9. 
This simply means that there is a member of the family of apriori assignments which 
can satisfy the specific evidences. For this example several steps are required for the 
final iteration to converge. This is because of the imprecision found for Pr[c(mary)}. 
If a point value was used for Pr[c(mary)} then the iteration would have converged in 
one step. In a later section we will deal with the non-monotonic logic case in which 
the specific evidences are inconsistent with the family of apriori assignments. 

5.2 Three clowns example 

Three clowns stood in line. Each clown was either a man or a woman. The audience 
was asked to vote on each of the first and last clown being male. 90% voted that the 
first clown, the one on the left, was a man and 20% thought the third clown, the one 
on the far right, was a man. Nothing was recorded about the middle clown. What is 
the probability that a male clown stands next to a female clown with the male on the 
left? 

If it were known for sure that the first was male and the third was female then a male 
would certainly be standing next to a female with the male on the left This problem 
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can be expressed in first order logic and the theorem proved by case analysis. The 
refutation resolution method popular in computer theorem proving programs could 
also be used but is much more cumbersome. The problem posed above is a 
probabilistic version of this. 

The set of labels for this problem is 
(mmm, mmf, mfm, mff, fmm, fmf, ffm, fff} 

Two evidences have been supplied: - 
Evidence 1:- {mmm, mmf, mfm, mff) : 0.9 
Evidence 2:- (mmm, mfm, fmm, ffm) : 0.2 


We can combine these two evidences using the general assignment method 


0.2 


Evidence 2 


0.8 



{ m} 


Evidence 1 

{m_m} 

{m_f} 

0.9: {m__} 

X 

0.9 -x 


(f_ m } 

{f_f} 

0.1 : {f } 

X 

1 

CN 

o 

x-0.1 


0 < x < 0.2 


Therefore the support pair for the statement S = “clowns of opposite sex stand next 
to each other with a male on the left of the pair = [MIN(0.9 - x), MAX(0.8 + x)] = 
[0.7, 1], 


We now consider this example using the iterative assignment method. Above we 
used specific information about the three clowns in line. We did not use any apriori 
information concerning clowns in general. In fact the apriori information we 
assumed was of the form 
(mmm, mmf, mfm, mff, fmm, fmf, ffm, fff) : 1 

This mass assignment could be used with the iterative assignment method using the 
specific information given for updating as follows 


0.9 0.1 

{m__} {f } 

1 : {m , f } {m } : 0.9 {f } : 0.1 

K’s 1 1 
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0.9 : {m } 

0.1 : {f } 


Final Update is 
0.18: {m_m} 

0.72: {m_f} 

0.02: {f_m} 

0.08: {f _ f) 

since {m } : 0.9 is satisfied so that both updating evidences are satisfied. 

The loop in this final mass assignment means that we can add and subtract around 
the loop without destroying the constraints and all these solutions satisfy the 
minimum relative entropy criteria with respect to some apriori assignment in the set 
of all possible apriori assignments. The solution which is produced by the iterative 
assignment method before any adding and subtracting around the loop is performed 
is that corresponding to the maximum entropy apriori assignment, ie. that member of 
the set of possible apriori assignments corresponding to maximum entropy. 

To obtain the necessary support for the statement S we must minimise the 
assignment given to (m _ f} so that we use the assignment 
0.2: {m_m} 

0.7: {m _f} 

0: (f _m} 

0.1: {f_f) 

since 0.02 is the maximum value we can subtract from 0.72 since otherwise the entry 
in the cell with assignment 0.02 would go negative. 

The possible support for S is obtained by maximising the assignment given to 
{m_ m, m _ f, f _ f} ie minimising the assignment given to {f _ m}. This is also 
satisfied by this last assignment giving the support pair [0.7 1] for S. 

The above analysis is equivalent to using the iterative assignment with all apriori 
distributions over the label set [mmm, mmf, mfm, mff, fmm, fmf, ffm, fff] which 
will allow both specific evidences to be retained when using the iterative assignment 
method. For example 

The apriori [0 0 0 .25 0 .25 0 0 0.25 0.25} will give 0.9 
The apriori {0.25 0.25 0 0 0.25 0.25 0 0} will give 0.8 
The apriori [0 0 0.1 0.4 0 0 0.25 0.25} will give 0.9 
The apriori [0 0.7 0.2 0 0 0.1 0 0} gives 1 
The apriori {0.2 0.4 0 0.3 0 0 0 0.1} gives 0.7 


K’s 


0.2 

{__m} 

{m _ m} : 0.18 
{f_m} : 0.02 
1 


0.8 

L_f} 

{m f}:0.72 
{f _0 : 0.08 
1 
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6. NON MONO TONIC REASONING 
6.1 WHY SHOULD THERE BE A PROBLEM? 

Consider the following example. Population statistics tell us that a thirty year old 
Englishman has a very high probability of living another 5 years. The statistics also 
tell us that a thirty year old Englishman who has lung cancer only has a small 
probability of living another S years. We are told that John is a thirty year old 
Englishman. We can conclude that it is very probable that he will live another 5 
years. If we are later told that he has lung cancer then we conclude he has little 
chance of living another 5 years. What we could conclude before this additional 
piece of information was given we can no longer conclude. From a logic point of 
view it appears that we have a situation in which we can approximate the modelling 
of this situation by replacing propositions with high probabilities with those 
propositions and propositions with low probabilities with their negation. Thus we 
have 


Vx {Englishman (x) a InThirties (x)} o Live5yrsmore (x) 

Vx {Englishman (x) a InThirties (x) a Cancer (x)} 3 -i Live5yrsmore (x) 

If Englishman (John) a InThirties (John) 

then we conclude LiveSyrsmore (John) 

If Englishman (John) a InThirties (John) a Cancer (John) 

then we conclude -i Live5yrsmore (John) 
showing a nonmonotonic behaviour. 

Thus situations like the above seem to make difficulties if we try to model them 
using first order predicate logic. 

From a probabilistic point of view there is no problem. We are told that 
Pr{Live5yrsmore (x) I Englishman (x) a InThirties (x)} is high ; 

for any x (1) 

Pr{Live5yrsmore (x) I Englishman (x) a InThirties (x) a Cancer (x)} is low ; 

for any x (2) 

This will not lead to any form of inconsistency. In fact if it is known that 
Pr {LiveSyrsmore (John) I Englishman (John) a InThirties (John)} is high 
this will tell us nothing about 

Pr{Live5yrsmore (John) I Englishman (John) a InThirties (John) a Cancer (John)} 
which can take any value in the range [0, 1]. 

We make inferences by selecting the correct sample space using the given specific 
information and determine the desired probability using this. In the case of John who 
is known to be an Englishman in his thirties the answer for the probability of him 
living another 5 years will be "high” if this is all we know about him. If we also 
know that he has lung cancer then a different sample space is used and the answer is 
“low”. 
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In terms of the iterative assignment method, the general statements (1) and (2) above 
are used to determine a family of apriori assignments which are updated with the 
specific evidences concerning John. These specific evidences could be uncertain in 
some sense, ie probabilistic statements, in this case. 

The next example illustrates this. 

6.2 penguin Example 

We reconsider this example which was discussed above. 


fly(X) bird(X)) : [0.9, 0.9]. (1) 

bird(X) penguin(X). (2) 

fly(X) penguin(X). (3) 

penguin(obj) : [0.4, 0.4] (4) 

bird(obj) : [0.9, 0.9]. (5) 


The rules (1), (2) and (3) define the family of apriori assignments. (2) and (3) 
eliminate certain possible labels as discussed above. The labels are: 

yl — i l>~i p — ■ f 

y2 — ib - ipf 

y3 b-ip-if 
y4 b-ipf 
y5 bp-if 

so that y4 / (y3 + y4 + y5) = 0.9 and yl + y2 + y3 + y4 + y5 = 1 
If we let 

k = y3 + y4 + y5 (6) 

then 

y4 = 0.9k (7) 

and the family of apriori for a given k, 0 < k < 1 is 
[b-,pf] :0.9k 

{ — i b - i p _ } : 1-k 

[b_— . f] :0.1k 

determined by combining (6) and (7) using the general assignment method. 

This family of assignments are updated using the specific evidences (4) and (5) with 
the iterative assignment method using the scheme 


0.9k 

{b-pf} 

UPDATE USING 

{b— ipf] 

0.1k 

{b _ — if) 

Pr[(b)} =0.9 

{b _ — i f } 

1-k 

{ — < b— i p _ } 

TO GIVE UPDATE 

[-ib->p_] 
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[b-ipf] 

UPDATE USING 

[bp-if] 


{b_-,f} 

Pr{(p)}=0.4 

(b-ipf) 


[-ib-ip_] 

TO GIVE UPDATE 

[b-ip-if] 




{— > b— i p _ } 


{bp— i f) 

UPDATE USING 

{bp-if} 


{b — i pf) 

Pr{(b)} =0.9 

{b-pf} 

I 

[b-ip-if] 

TO GIVE UPDATE 

[b-ip-if] 

T 

{ — 1 b— i p _ } 


{->b->p_} 

E 

p 




lx 

A 

[bp-if] 

UPDATE USING 

{bp—, f) 

T 

[b-ipH 

Pr[(p)} =0.4 

{b— .pf} 

E 

[b-ip-if] 

TO GIVE UPDATE 

[b-ip-if] 


{ — > b— i p _ } 


{“•b— > p_ } 



From the final family of assignments we can determine the support pair for f(obj) 
f(k) : [assignment for [b-> pf} , assignment for {b— > pf} + assignment for f— . b— > p _ }] 

= [0.45,0.55] 

This final support pair is in actual fact independent of k, so that this is the actual 
support pair for “f ’ 

ie.Pr{fly(obj)} e [0.45, 0.55] 

63 Complete Model for penguin Example 

Consider the program 
bird(X) : [0.7, 0.7]. 
fly(X) bird(X) : [0.9, 0.9]. 

fly(X) bird(X), penguin(X) : [0, 0][0.95, 0.95]L J[0.1, 0.1]. 

bird(X) penguin(X) : [1, 1]. 

fly(X) penguin(X) : [0, 0)]. 

penguin(obj) : [0.4, 0.4]. 

bird(obj) : [0.9, 0.9)]. 

This program says that 

The proportion of birds in the relevant population of objects is 70%. 90% of the birds 
can fly. No object which is a bird and penguin can fly, 95% of birds which are not 
penguins can fly, 10% of objects which are not birds can fly. All penguins are birds. 
No penguin can fly. The 4 support pairs associated with the third rule correspond to 
Pr[fly(X) I bird(X), penguin(X)}, Pr[fly(X) I bird(X), -i penguin(X)} 

Pr[fly(X) I bird(X), penguin(X)}, Pr[fly(X) I -i bird(X), -i penguin(X)} 
respectively. 

It also gives specific information about the object obj, namely that there is a 
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probability of 0.9 that obj is a bird and a probability of 0.4 that obj is a penguin. 

This information allows the following unique distribution over the relevant labels to 
be constructed: 

Apriori 


yl = 0.27 

— ib - ip- if 

y2 = 0.03 

—i b— i pf 

y3 = 0.0332 

b- ip- 1 f 

y4 = 0.63 

b-ipf 

y5 = 0.0368 

bp — i f 

using 

y4 / (y3 + y4) = 0.95 

; y2/(yl +y2) = 0.1 ; y4 / (y3 + y4 + y5) = 0.9 

y3 + y4 + y5 = 0.7 ; 

yl + y2 + y3 + y4 + y5 = 1 

The iterative assignment update then gives (fly Mary) : (0.485 0.485). 

Intuitive solution 


In this problem we are presented with two pieces if information: 
1 . Object Mary came from a population with statistics 

— ib - ip- if 

0.27 

— i b — i pf 

0.03 

b— ip- 1 f 

0.0332 

b-ipf 

0.63 

bp — i f 

so that 

0.0368 


IF object obj has properties bp then Pr(obj can fly) = 0 

IF object obj has properties b-i p then Pr(obj can fly) = 0.63 / 0.6632 = 0.95 

IF object obj has properties — i b — . p then Pr(obj can fly) = 0.03 / 0.3 = 0.1 

2. Object properties 

2(a) b:0.9 ; -.b:0.1 

2(b) p : 0.4 ; -i p : 0.6 

2(c) obj cannot be a penguin and not a bird 

Combining 2(a) and 2(b) taking account of 2(c) by allowing only the set of labels 

{— i b— i p, b — i p, bp) 

using the general assignment method gives 



p:0.4 

-.p : 0.6 

0.9 

bp 

b-ip 

b 

0.4 

0.5 

0.1 

-.b 

— i bp 

(not allowed) 
0 

— i b — i p 
0.1 
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giving 

bp : 0.4 ; b-< p : 0.5 ; -> b-i p : 0.1 

Expected value of Pr(obj can fly) = 0.5*0.95 + 0.1*0.1 = 0.485 
the value given by the iterative assignment method. 

We can write 

P’r(f) = Pr(f I bp)P’r(bp) + Pr(flb-i p)P’r(b-. p)+Pr(f l-ib-i p)P’r(f I b-, p) 
where Pr(.) signifies a probability determined from the population statistics, 
information 1, and P’r(.) signifies a probability determined from the specific 
information, information 2 and set of possible labels. 

This is a form of Jeffrey’s rule. 

7. SEMANTIC UNIFICATION 

7.1 Nested Sets . Possibility Distributions and M ass Assignments 
Let X = {xl,x2, ... ,xn) 

Let Al, A2, ... , An be nested subsets of X such that 

A1 C A2 . . . C An where Ai = {xl, ... , xi} 

Let m be a mass assignment over these nested sets 

m(Ai) > 0 , all i 

m(Ai)=l 

i 

Let the necessary support and possible support measures for this special case of 
nested sets be called necessary and possibility measures denoted by N(.) and P(.) 
respectively. It is easy to show that 
P(A uB) = MAX{P(A), P(B)) 

N(A n B) + MIN{N(A), N(B)) 
for all A, B e P(X) 

[ZADEH 1978], [KLIR, FOLGER 1988]. 

Let Pf be a function 
p f : X -» [0, 1] 

called the possibility distribution of f over X. 

Let pi = p j(xi) for all xi e X and ordered such that 

We will only consider normalised possibility distributions corresponding to = 1. 
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Define a mass assignment over the nested sets 
A1 = {xl}, A2 = {xl, x2}, ... , An = X 
as 

mi = m(Ai) where mi = pj - pj + j with p fi+ j = 0 
so that 


P(Ai) = X m ( Ak ) 

Ai nAk*0 

and more specifically 
n 

P({xi}) = ]£mk = pj 
k = i 


= MAX pj 
xi e Ai 


Corresponding to a possibility distribution p^ for f over X there is a unique mass 
assignment m over the subsets { Ai} given by the set of formulae above. 


Let f be a normalised fuzzy set 
f=xl/X 1 +x2/x 2 + - + xn/x n 

where % 1= 1 and Xj> % 2 ---Xn 

This induces a possibility distribution p^ over X given by 

P f (xO = P i = X i 

with an associated mass assignment over the nested sets 
AI = {xl}, A2= {xl,x2}, , An = {xl,x2, .... xn} 
given by 

m(Al) = 1 - ; m(A2) * x 2 - X 3 i • i «n(An) = x„ 

This mass assignment represents the family of possible probability distributions over 
X induced by the fuzzy set f. 


We can generalise this to the case of continuous fuzzy sets like those discussed in the 
introduction but we will not do this in this paper. The continuous case can always be 
treated by approximating the continuous fuzzy set f with membership function Xf 

defined over R by a discrete set of pairs {xi / Xj} where Xf(xi) = Xj an d the interval R 

is approximated by the set of points {xl, x2, .... xn} . 


7-2 Examples 

We can associate with the fuzzy set defined on {a, b, c, d, e} 
fl = a/0.2 + b/0.4 + c/0.8 + d/l 
the mass assignment 

{d} : 0.2 ; {c,d} :0.4 ; {b.c.d} :0.2 ; {a, b, c, d} : 0.2 
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We give an example with repeated membership levels: 

Associated with die fuzzy set 

f2 = a/0.1 +b/0.3 + c/0.3 + d/0.7 + e/l + f/l 

is the mass assignment 

{e, f) : 0.3 ; {d, e, f) : 0.4 ; {b, c, d, e, f} : 0.2 ; {a, b, c, d, e, f} : 0.1 
7 3 . VOTING M ODEL INTERPRETATION 


We will use a voting model with constant thresholds to interpret the meaning of a 
fuzzy set. Consider the fuzzy set “tall” defined on the height space [4ft, 8ft] by 
means of the membership function How can we interpret X^^ft 10”)? 

Consider a representative population sample of persons, S say. We ask each member 
of S to accept or reject the height Sft 10” as satisfying the concept “tall”. Each 
member must accept or reject ; there is no allowed abstention. %^j(5ft 10”) is put 

equal to the proportion of S who accept. 


We can therefore interpret the fuzzy set 
fl = a/0.2 + b/0.4 + c/0.8 + d/l 
as 

20% of S accept a as f 1 
40% of S accept b as fl 
80% of S accept c as f 1 
100% of S accept d as f 1 
100% of S reject e as fl 


One possible voting pattern of acceptances is 


1 2 

3 

4 

5 

6 

a a 





b b 

b 

b 



c c 

c 

c 

c 

c 

d d 

d 

d 

d 

d 

An alternative pattern is 




1 2 

3 

4 

5 

6 

a 

a 




b b 


b 


b 

c 

c 


c 

c 

d d 

d 

d 

d 

d 


7 8 9 10 

c c 

d d d d 

7 8 9 10 

c c c c 

d d d d 


The first pattern is more reasonable than the second. In the second pattern voter 3 
accepts a which has a low membership level but doesn’t accept b which has a higher 
membership level. It seems that anyone who accepts a member with a certain 
membership level will accept all members with a higher membership level. This we 
call the constant threshold assumption. The first pattern satisfies the constant 
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threshold assumption. From the first pattern we can deduce 
20% of S give acceptance to exactly {d} 

40% of S give acceptance to exactly {c, d} 

20% of S give acceptance to exacdy {b, c, d) 

20% of S give acceptance to exactly {a, b, c, d} 
and this defines a mass assignment over the nested sets 
{d}, {c, d}, {b, c,d}, {a, b, c, d} 
namely 

{d} : 0.2 ; {c, d) : 0.4 ; {b,c,d} :0.2 ; {a, b, c, d} : 0.2 

We can interpret this mass assignment in the following way. If the population S is 
told Z has property fl and a member of the population drawn at random is asked 
what the value of this property taken horn (a, b, c, d, e} for Z is, the answer would 
be a family of distributions over {a, b, c, d, e) deduced from the mass assignment 
above. 

This interpretation is consistent with the general method given above. 

This interpretation is not valid if the fuzzy set is non- normalised since the constant 
threshold model cannot be satisfied. 

7.4 Semantic Unification 

We discussed the need to determine an interval containing the conditional probability 
Pr{age(mary, middle_aged) I age(mary I about_35) for the example given in the 
introduction. This we term semantic unification. 

Consider the statements 
Xisfl 
a is f2 

where fl and £2 are fuzzy sets defined on the universe of discourse F. Then we are 
interested in determining Pr{a is fl I a is f2}. 

We can associate the mass assignments (ml, Fl) and (m2, F2) with fl and f2 
respectively where Fl and F2 are the focal elements and are nested sets. 

For any member sli of Fl and any member s2j of F2 we can determine the support 
pair for sli I s2j from the set {[0, 0], [1, 1], [0, 1]). Let this be [Sn(sli I s2j), Sp(sli I 
s2j)] 

Let ml = (mli) and m2 = {m2j} 

Therefore the expected value of Pr(a is fl 1 a is f2) is contained in the support pair 

[Sn(a is fl I a is f2), Sp(a is fl I a is £2)] 

where 
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Sn(a is fl I a is f2) = ^ mli.m2j.Sn(sli I s2j) 

i.j 


Sn(a is fl I a is f2) = ^ mli.m2j.Sp(sli I s2j) 

7-5 Example 

Consider the fuzzy sets defined on {a, b, c, d, e} 
fl = a/0.2 + b/0.4 + c/0.8 + d/l 
f2 = a /I + b / 0.8 + c / 0.1 

The corresponding associated mass assignments are 

{d} : 0.2 ; {c,d}:0.4 ; {b,c,d}:0.2 ; {a, b, c, d} : 0.2 

and 

{a} : 0.2 ; {a,b} :0.7 ; {a,b,c} : 0.1 
respectively. 

Therefore 

S«d}l{a}) = [0,0] 

S({c, d} I {a}) = [0, 0] 

S({b, c, d} I {a}) = [0, 0] 

S({a,b,c,d} I {a}) = [1, 1] 
and 

S({d} I {a,b}) = [0,0] 

S({c, d} I (a,b})= [0,0] 

S({b,c, d} I {a, b}) = [0, 1] 

S({a, b, c, d} I {a, b}) = [1, 1] 
and 

S({d} I {a,b,c}) = [0,0] 

S({c,d} I [a,b,c}) = [0, 1] 

S({b,c,d} I {a,b,c}) = [0, 1] 

S({a, b, c, d} I {a, b, c}) = [1, 1] 
so that 

Sn(a is fl I a is f2) = 0.2*0.2 + 0.2*0.7 + 0.2*0. 1 = 0.2 

Sp(a is fl I a is f2) = 0.2*0.2 + 0.2*0.7 + 0.2*0.7 + 0.4*0.1 + 0.2*0. 1 + 0.2*0.1 = 0.4 
so that 

The support pair for the unification of f 1 given f2 is 
a is fl I a is f2: [0.2, 0.4] 


7.6 A SPECIAL CASE 
LetF= {el,e2, ... ,el0} 
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and 

f = el/0.1 + e2/0.2 + e3/0.3 + e4/0.4 + e5/0.5 

+ e6/0.6 + e7/0.7 + e8/0.8 + e9/0.9 + elO/l.O 

then 

fl I fl : [0.55, 1] 

LetF= [el,e2, ... ,el0} 
and 

f = el/0 + e2/0.1 + e3/0 2 + e4/0.3 + e5/0.4 

+ e6/0.5 + e7/0.6 + e8/0.7 + e9/0.8 + el0/0.9 

then 

f I f : [0.45, 1] 

These are two approximations for determining flf where f is a ramp fuzzy set on an 
interval R. In the limit as more and more points in R are used we obtain 
flf: [0.5, 1]. 

This illustrates how we can deal with continuous fuzzy sets. 

7.7 ITERATIVE ASSIGNMENT METHOD WITH SEMANTIC UNIFICATION 

Consider the program discussed in the introduction 

mamed(X) age(X, middle_aged), has_children(X) : [0.7, 0.9]. 
age(mary, about_35) 
has_children(mary) : [0.8, 1]. 

We can ask the query 
?- married(mary) 

To answer this query we determine the support pair [x, y] below by the method in the 
last section applied to 
middle_aged I about_35 

age(mary, middle_aged) age(mary, about_35) : [x, y]. 

We then solve 

manied(X) age(X, middle_aged), has_children(X) : [0.7, 0.9]. 
age(mary, middle_aged) : [x, y]. 
has_children(mary) : [0.8, 1]. 


using the iterative assignment method as described previously. 
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8. CONCLUSIONS 

This paper provides a general approach to evidential reasoning when the knowledge 
representation is in the form of rules and fact with both probabilistic and fuzzy 
uncertainties included. The methods provided can be used for other forms of 
knowledge representation, for example, Bayesian networks [PEARL 1988], Moral 
Graphs [LAURITZEN, SPIEGELHALTER 1988] with extensions to the case of 
uncertain specific information and valuation-based languages for expert systems 
[SHENOY 1989]. The non-monotonic case is not seen to be a problem. Without 
decomposition methods the approach given here could easily become 
computationally excessive. Decomposition methods have only been lightly touched 
on in this paper although expressing knowledge in the form of rules provides a 
natural decomposition. Inference diagrams mean be used to construct a 
decomposition from a group of rules. For special cases the decomposition allows the 
calculus of support logic programming used in FRIL to be used for answering 
queries. The methods given here extends FRIL to cases which cannot be treated by 
die present version. The next version will take account of these extensions. 
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PROBABILISTIC SETS 

PROBABILISTIC EXTENSION OF FUZZY SETS 


Kaoru Hirota 

Dept, of Instrument & Control Engineering College of 
Engineering, Hosei University 3-7-2 Kajino-cho, Koganei- 
city, Tokyo 184, Japan 


Introduction 

In the field of pattern recognition or decision making 
theory, the following complicated problems have been 
left unsolved: (l)ambiguity of objects, (2)variety of 
character, (3)subjectivity of observers, (4)evolution of 
knowledge or learning. With regard to each problem, 
however, there are several general theories: many-valued 
logic, fuzzy set theory (in connection with (l)and(2)), 
modal logic (in conjunction with (2)and(4)), and 
subjective probability (in relation to (3)). It seems, 
however, that there are few carefully thought-out 
investigations by paying attention to all problems 
mentioned above. In this paper we would like to give our 
opinion about these problems and to introduce a new 
concept called 'probabilistic sets'. 

By giving examples in comparison with fuzzy set theory, 
the background idea of probabilistic sets is explained 
in Section 2. In probabilistic sets, it is essential to 
regard the value of membership functions of fuzzy sets 
as a random variable. A probabilistic set A on a total 
space X is defined by a defining function h a (x,uj), 
which is a point (i.e. xeX)-wise (B, B,.) -measurable function 
from a parameter space ( n,B,P ) to a characteristic space 
(n c , B c ). The parameter space ( n,B,P ) is a probability 
space and is closely related with subjectivity, 
personality, and evolution of knowledge. The 
characteristic space (I7 C , B c ) is a measurable space 
usually adopt ([0,l],Borel sets) as ( fl c ,B c ) . Section 3 
describes definitions of probabilistic sets from a 
measure-theoretical viewpoint. The concept of 
probabilistic sets includes the concept of classical 
fuzzy sets. Some other properties are important results 
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is that the family of all probabilistic sets constitutes 
a complete pseudo-Boolean algebra. In Section 5, some 
nev concepts are shown such as moment analysis and 
expected cardinal numbers. The possibility of moment 
analysis is an essential feature of probabilistic sets 
and it is a great advantage in applications. 


The background idea of probabilistic sets 

Digital computers have been widely used in the field of 
pattern recognition, decision making theory, artificial 
intelligence and so on. It should be noted, however that 
they involved the following complicated problems to be 
solved: 

(1) ambiguity of objects, 

(2) variety of property or character, 

(3) subjectivity of observers, 

(4) evolution of knowledge or learning. 

In order to take up these problems, several general 
studies have been made such as fuzzy set theory, many- 
valued logic, modal logic, quantum logic, subjective 
probability. In particular, fuzzy set theory has been 
widely studied. (More than one thousand papers have been 
published since L.A. Zadeh presented fuzzy set theory 
[14].) We have also been studying these problems 
especially by paying attention to inherent and special 
characteristics of pattern recognition and decision 
making theory. Giving an example, we shall deal with the 
background of our idea ’probabilistic sets’ and shall 
compare it with fuzzy sets. 

Let all real numbers be a total space X . Consider all 
numbers nearly equal to one and all numbers nearly equal 
to minus one. In fuzzy set theory, their membership 
functions are shown as in Fig.l. 



/Y\ 


-1 0 1 
(c) 


-X 


Fig. 1. Fuzzy sets; (a) numbers near one, (b) numbers near minus one, 
(c) the union (numbers near one or minus one). 
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(a-1) (a-2) (a - 3 ) 


A /I 


-I 0 I 
(b-1 ) 


-1 0 1 
(b-2) 


-1 0 1 
(b - 3 ) 



-1 0 1 
(c-2) 


-1 0 1 
(c-3) 


defining function mean value variance 


Fig. 2. Probabilistic sets; (a) numbers near one, (b) numbers near minus one, 
(c) the union (numbers near one or minus one). 


In this situation, however, the following discussion 
may be possible: If the degree of ambiguity were 
accurately given, it would no longer be ambiguous. 
Although mean value or variance may be determined and a 
rough tendency nay be given, it is impossible in general 
to assign definite [0,l]-values. To make the matters 
worse, the tendency varies according to observers' 
subjectivity, situations and so on. Hence we shall 
introduce a probability space (12,B,P) , called a 
parameter space, whose element represents a standard of 
judgments. It is assumed that if a standard «(el2) is 
fixed, the degree of ambiguity of considered objects 
(i.e. elements of the total space X ) can be definitely 
determined. A set of all degrees of ambiguity will be 
called a characteristic space (12 C , B c ) . We usually adopt 
([0,l],Borel sets) as the characteristic space, because 
it is an infinite totally ordered set with a maximum 
element 1 and a minimum element 0 and because it is in 
harmony with characteristic functions of ordinary sets 
and membership functions of fuzzy sets. A probabilistic 
set on a total space X is defined by giving a (12 c , B c )- 
valued random variable on (12, B, P) for each object x(eX) , 
and this correspondence will be called a defining 
function of the probabilistic set. The corresponding 
probabilistic sets of Fig.l are shown in Fig.2(a-1) , (b- 
2),(c-l). The parameter space (12 ,B,P) is expected to be 
adopted suitably according to each situation, hence in 
general no restrictions are added to the parameter space 
except it is a probability space. For example, in the 
case of Fig. 2, the parameter space might exist in 
observers' subconscious and might be changed according 
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to estimate it by a statistical method. One of the most 
important facts in probabilistic set theory is a 
possibility of moment analysis by using a probabilistic 
measure p of the parameter space. For instance, in 
Fig. 2, mean values and variances are shown the parameter 
space. For instance, in Fig. 2, mean values and variances 
are shown in (a-2) , (b-2) , (c-2) ,and (a-3) , (b-3) , (c-3) , 
respectively. The mean value indicates the first 
approximation of probabilistic sets and might be 
considered to be the same one as a membership function 
of 'union' generally has a continuous but non-smooth 
(i.e. non-differentiable) point as shown in Fig. 1(c) (at 
a point of x=0 ). It will be natural to expect a smooth 
curve like Fig.2(c-2) as the first approximation of 
'numbers nearly equal to one or minus one'. The variance 
provides the second information and it indicates a 
disordered degree of judgments. Higher moments can be 
considered in the same way. Moreover, it can be shown 
theoretically that the nth moment around mean value 
tends to zero as n tends to infinity (cf. Proposition 9). 
Hence, from a practical viewpoint, it is sufficient to 
consider only the lower moments, i.e. mean value and 
variance. If we consider a probabilistic set with 
variance zero, it could be identified with a fuzzy set. 
In this sense, it can be concluded that the concepts of 
probabilistic sets include classical fuzzy concepts. 

To make sure our option, we shall give several 
comments. A distinction between the total space X and 
the parameter space ({l,B,P) is very important in 
probabilistic set theory. The concept of probabilistic 
sets differs intrinsically from Zadeh's way of thinking 
[15] on this point. 

A notion of fuzzy set of type 2(Mizunoto and Tanaka 
[12]) is introduced in order to resolve the difficulty 
of settling a definite ambiguous degree. Fuzzy set of 
type n is also characterized by n step recursively 
defined ambiguity. However, the number of steps (i.e. n ) 
has no upper bound and, to make matters worse, realistic 
meanings decline as n increases. In probabilistic sets, 
the ambiguity is arranged on the parameter space and 
realistic meanings are made clear in connection with the 
subjectivity of observers. 

A family of probabilistic sets constitutes a complete 
pseudo-Boolean algebra(cf. Theorem 1). A pseudo-Boolean 
algebra is a subclass of distributive lattices(Fig.3) . 
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Hence, from a lattice theoretical viewpoint (cf.[l]), 
probabilistic set theory takes its position between L- 
fuzzy set theory [3] and Boolean algebra valued set 
theory [13] . 

In probabilistic set theory, the parameter space 

(12 ,B,P) plays an important role, but it has no 
restriction except it is a probabilistic measure space. 
The most important task in applications is a choice of a 
suitable parameter space, especially an establishment of 
probabilistic measure P. Finally, we would like to add 
that there is no need to recollect a probabilistic 
randomness like casting a dice in spite of a diction 
’probabilistic' sets. 


Definitions of probabilistic sets 

The above discussion is informal from a mathematical 
point of view. The strict definitions are shown in this 
section. The mathematical foundation of this theory is 
measure theory and some well-known facts in measure 
theory will be used(cf . [5] ) . 

First, we would like to define the following three 
terms. (The meanings were discussed in a previous 
section. ) 

Definition 1. 

(12, B, P) is a parameter space, (fi c ,B c ) = ( [0,1] .Borel sets) 
is a characteristic space, M={/u. | fi B c ) - 

measurable function} is a family of characteristic 
variables. 

It is easily shown that M satisfies the following 
properties. 

Proposition 1. 

For arbitrary ^’s (/x,eM, i = 1, 2, ... at most countably 
infinite), the following properties are satisfied. 

min(/x„ /x 2 )eM, (!) 

( 2 ) 


max(/Lt t , /x 2 ) e M, 

^ = c e M where c e 12 c = [0, 1] (fi : constant fn.), 


(3) 
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l/i., - fl 2 | 6 M, 

Afi, + (l-A)ji. 2 eM where 0 
fi“eM where a^O, 

/bl^GM, 

inf /uti g M, 

is»l 

sup /utj G M, 

i^i 

lim /utj = sup inf ^ e M, 

i— *oo 1 2* 1 ;>i 

lim /uii = inf sup ^ e M 

i— *«> i»l j»i 


(4) 

A<1, (5) 

( 6 ) 

(7) 

( 8 ) 
(9) 

( 10 ) 

( 11 ) 


The fundamental definition of probabilistic sets will 
be given as follows. Here a total space x={x) , which 
represents a set of all objects discussed in each 
situation, is arbitrarily fixed. 

Definition 2. 

A probabilistic set A on x is defined by a defining 
function **a 

iXxfl — > fl c * 

W Ul (12) 

(x, <»)•-» to) 

where MaU ) is the (b, B c ) -measurable function for each 
fixed x(eX). 

For any two probabilistic sets A and B , whose 
defining function are /* A (*»*>) and /Mx, «) , respectively, 
A is said to be included in B(A<=B) if for each x(eX) 
there exists E(eJB) which satisfies 


P(E)= 1, 

(13) 

jii. A (x, a)) =£ /x B (x, to) for all <o e E- 

(14) 


In this situation we will sometimes use a brief notation 
as follows, 
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fi A (x,<o)^fi B (x,<o) for all x € X and a.e. w e 12. (15) 


If both A<=B and B<=A are satisfied, A and b are 
said to be equivalent (A^B). (Indeed this relation = 
satisfies an equivalence relation; i.e. reflexivity, 
symmetricity , and transitivity.) All equivalent 
probabilistic sets are considered to be the same one and 
are not distinguished. All probabilistic sets on X is 
said to be a family of probabilistic sets and is denoted 
by &(X) - 

Note. 

An element of 0>(X) represents an equivalence class of 
M by the equivalence relation s for each x(eX) . 

The inclusion relation in 0>(x) satisfies reflexivity, 
anti-symmetricity , and transitivity, hence (^(X),c:) . 
constitutes a poset (partially ordered set). 


In the following, several operations in 9>(X) will be 
defined. A fundamental operation in $*(X) is ’union', 
however, it is a little complicated. Let A y 
(yer,r rpossibly infinite) be probabilistic sets on x 
whose defining functions are /x Ay (x, <o) respectively. The 
union of (A,}., 6r , which is denoted by LJA* , is defined 
by a defining function /i UAy (x, w) which will be given by 
the following procedure. For the time being, consider a 
case where each x(eX) is arbitrarily fixed. Then n Ay (x,-) 
may be regarded as a function of <*>el2 (i.e. an element 

of M). Since n Ay (x,-) is a 12 c =[0,1] -valued measurable 
function, and since the total measure is finite (i.e. 
P(W= 1 ) , Ha,(x, •) is always P -integrable. 


0=e 


w ) ‘ dP(&>)^ 1. 


(16) 


For arbitrarily fixed n indices , y 2 , • • • . 7™( e O , a 

function max {pa Y( U, )| } is also an element of M 

(see Proposition 1(2)). Hence it is also P -integrable, 

0*s [ max{jot AT (x, w) | 1 « n) • dP(a>)ss 1. (17) 


The selection of T 1 .Y 2 .---.Y-* from r is varied. The least 
upper bound, denoted by a(x) , can be calculated, 
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a(x) = sup] max{p A (x, <o) | 1 *£ i *£ n}dP(w) | 

U„ 

n e N (natural numbers), y f e T 

0*£a(x)*£l. 

Since a(x) is a 'least upper bound', there exists a 
countably infinite subsequence 

{max{/x Ay (x, w) 1 1 =s i n,} | n, g N, y, g F}” = , 

such that 

lim max{/u, AT (x, w) | 1 *£ i *£ nj • dP(o>) = a(x). (20) 

j— 00 Jq 

Although an element x(gX) was arbitrarily fixed, this 
procedure could be done for each x(eX) . We shall define 
the defining function ft UAr (x, w) by 

M-U A, (*, “>) = sup{max{fi Ay (x, a>) 1 1 *£ i =£ n,} 1 1 =£ j «*} ( 21 ) 

The justification of this definition will be ensured by 
the following Proposition 2. 

Proposition 2. 

(1) The union [JA, is determined uniquely by (21), i.e. 
if there exists another countably infinite subsequence 
which satisfies (20), the result given by the same 
equation as (21) also belongs to the same equivalence 
class of M (for eachxeX) in the sense of Definition 2. 

(2) For all yer, we have A,<=UA, . 

(3) If there exists an A which satisfies A y cA for 
all •yef’, then we have UA,cA . 

The proof is omitted here, since it requires some 
results in measure theory and a rather long description 
(cf . [7] ) . 

Although the above stated procedure of union is rather 
complicated, it can be simplified in a case that the 
index set r is at most countably infinite. For example, 
the union of A and B (whose defining functions are 
p A (x, to) and p B (x,(o) t respectively) may be defined by 


(18) 

(19) 


p A ob(x, o>) = max{/u, A (x, <o), p B (x, o>)} 


(22) 
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for eachxeX and each we 11, and the union of {A,}”,, may 
be defined by 

Mu A„(x, 0)) = sup{ju.A„(JC, <*>) I l*sn<°c} (23) 

for each xeX and each we 12. The complexity in a general 
case arises from the fact that M is not always closed 
by more than countably infinite operations (see 
Proposition 1). 

The 'intersection' of (AKer , which is denoted by 
DA, is a dual concept of 'union' (JA* an< * 
defined as follows. Put 

b(x) = inf | f min{MA Tl (x, w) | l«i^n}dP(w) | neN, ^er], (24) 


'Jfi ) 

0«b(x)«l, (25) 

and choose a countably infinite subsequence 

{min{/x AY U, o>) | 1 < i < n,} | n, e N, y, e F}”, , ( 26 ) 

such that 

lim | min {fi A (x, w) | 1 / ss n ,} dP(w) = b(x), ( 27 ) 

i— oo J n 

and define 

MnA,U, o») = inf{min{MA Ti U w) | l«s/<<*}. (28) 


The justification of this definition will also be en- 
sured by the same proposition as Proposition 2. (Change 
symbols U, c to D, 3 respectively in Proposition 2.) 

Some other useful concepts or operations on &{X) could 
be defined. They will be summarized as follows. The 
justification of these definitions is also ensured by 
Proposition 1. 

Definition 3. 

Total set X 

Mx(x, w)=l for all x e X and a.e. w e 12. (29) 

(This notation will be omitted until (36).) 

Void set (or null set) <f> 
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P+fa w) = 0. (30) 

Complement of A A c 

/ x a «(x,w)=1-/x a (x,o>). (31) 

Difference A-B 

Ha-b (*> (o) = max{0, fi A ( x , to) - fi B (x, <o)}. ( 32 ) 

Symmetric difference AAB 

P* A A b(Xj w) Pb(xi «)|. (33 ) 

Algebraic sum A©B 

fi A ©n(*> < °) = Pa fa w ) + M-b(*> G>)~PAfa at)(i B fa “>)• (34) 

A sum A£B (where 0«A«1) 

PAtsfa <o) = A#a a (x, «>)+(l-A)jt B (x, <«>). (35) 

a power A" (where as*0) 

fi A -fa w ) = Pa fa <°T- (36) 

Superior limit of {A„}".i 

00 00 

lim A„ = f1 U A- (37) 

n — »ao n*l k = n 

Inferior limit of {A„}”-i 

HmA=UnA- ( 38 ) 

n— *00 n = 1 k — n 


An ordered pair (/x A (x,<o), p B fao>)) is said to be a direct 
product of A and B , and is denoted by Axb . 

A y is said to be an one point probabilistic set at 
yex.if its defining function PA y fa^ satisfies 

[, “)' dP ("){io X x*-l 


(39) 
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A y is said to be a full one point probabilistic set at 
y€ X, if its defining function f*A,U") satisfies 

£ fAAy (x,a,)-dP( < o) = {® *** (40) 


Some properties of probabilistic sets 

Some properties of probabilistic sets can be 
characterized from a lattice theoretical viewpoint 
(cf . [17] ) . 

A family of probabilistic sets (^(X),c) constitutes a 
poset (see the note of Definition 2). For arbitrary 
A, B(e^(X)) , there exist a supremum AUB and an infimum 
AfiB with respect to this partial order <= (see 
Proposition 2 (2)and(3)). Hence the poset (0>(X),c) forms 
a lattice and the following proposition holds. (Note 
that a set of the following properties is a necessary 
and sufficient condition of being a lattice.) 

Proposition 3. 

For arbitrary A, B, C(e0>(X)) , we have 


commutativity 

A U B = B U A, ( 41 ) 

A fl B = B n A, < 42) 

associativity 

(A U B)U C= A U(B U C), (43) 

(A n b) n c = a n (B n C), * 44 ) 

absorption law 

A U (A n B) = A, (45) 

A fl (A U B) = A. ( 46 ) 


It is also possible to show that there exist pseudo 
complements in 9>(X) . Let A and B be arbitrarily fixed 
two probabilistic sets whose defining functions are 
n A (x,(o) and p B (x,<o), respectively. Consider the following 
equation, 

, . ( 1 if fi A (x, u))^fi B {x, w), 

X,< ° l/i. B (x, w) if ju A (x, <o)>ii B (x, a>). 


(47) 
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For each x(€X), #i A .(x, •) is a (B, B c ) -measurable function and 
is an element of M, since a set {<*> | ju. B (x, <o)^0j 
belongs to B (i.e. this set is measurable). Hence it is 
possible to define a probabilistic set A' by (47). It is 
also clear that A' is the largest probabilistic set of 
those C’s which satisfy A n C<=B(Ce&(X)). in this sense, 

A' is said to be a pseudo complement of A relative to 

B. Hence (^(X), <=) constitutes a pseudo-Boolean algebra 
(cf. Fig. 3). (A pseudo-Boolean algebra is a relative 
complemented lattice with a minimum element. In this 
case the minimum element is <t > . ) 

Moreover, for arbitrary (A y } Y6r (<=^(X)) ( r -.possibly 
infinite), the existence of (JA, and RA, was shown in 
a previous section and they played a role of a supremum 
and an infimum with respect to the order <= . Hence it is 
proved that the lattice (0>(X), <=) is complete, and so we 
can conclude the following Theorem 1 from a lattice 
theoretical viewpoint. 


POSET (PARTIALLY ORDERED SET) 
LATTICE 

MODULAR LATTICE 

DISTRIBUTIVE LATTICE 

PSEUDO-BOOLEAN ALGEBRA 
BOOLEAN ALGEBRA 


Fig. 3. An inclusion diagram of various lattices. 


Theorem 1. 

A family of probabilistic sets (0HX), <=) constitutes a 
complete pseudo-Boolean algebra. 

Note. 

In ordinary set theory, a family of all subsets 
constitutes a complete Boolean algebra. The difference 
between the two is the lack of a complemented law 
(i.e. A U A c ^ X, A n A c / </>) . In probabilistic set theory, 
it is essential to consider ambiguous states, so we can 
not get any definite information if we know that the 
considered object is not in one state. In ordinary set 
theory, however, we information that it is not in one 
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state. (Note that ordinary set theory can be considered 
to be a two-valued logic.) Hence the lack of 
complemented law is unavoidable in probabilistic set 
theory. 

Since the notion of pseudo-Boolean algebra is included 
in that of distributive lattice (see Fig. 3), a 
distributive law holds in (^(X), «=) . Moreover, in 
connection with its completeness, we can generalize 
commutative law, associative law, distributive law, and 
de-Morgan's law as follows. (Proofs are omitted here.) 

Proposition 4. 

For arbitrary subfamilies of probabilistic sets 
(A T } Y6 r and (B A } A6/1 . 
we have 

generalized associative law 


(u a y )u(u B a ) = U (A, U B a ), 

(48) 

(n A Y )n(n B A )=n(A Y nB A ), 

(49) 

generalized distributive law 


(u A y )n(u B A ) = U(A Y nB A ), 

(50) 

(n A Y )u(n b a )=d(a y ub a ), 

(51) 

generalized de-Morgan’s law 


( u a y ) c = n a c y , 

\ 7 er / 7 er 

(52) 

( n a y ) c = u a c y . 

' 7 er / y eT 

(53) 


Some other important properties in &(X) will be 
mentioned in the following without proofs. 

proposition 5. 

For arbitrary A, B, C(e^(X)) , we have 
idempotent law 


A U A = A, 


(54) 
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A n A = A, 
involution law 


( 55 ) 


A cc = A, (56) 

elimination law 
AUB = AUC1 b = c 

AnB = AnCJ ’ (57) 


identity law 
A UX = X, 
Af»X = A, 

A U <£ = A, 
An<#> = <*>. 


(58) 

(59) 

(60) 
(61) 


Proposition 6. 

For arbitrary {A„}“_ 1 (c^ > (x)) , we have 

lim A„ c: lim A„, 

n-»« n— *°° 

c 

lim A c n = 

n — ►» 

If A, c A 2 c • • • <= a„ <= • • • > then we have 

00 

lim A„ = lim A„ = (J A„. 

n— n— *» n-1 

If A, => A 2 o • • • ^ A„ => • • • , then we have 

QO 

lim A„ = iim A„ = f| A,- 

n-»» n->» rt»>l 

If A 2n+1 = A and A 2n = b , then we have 
UlD A„ = A n B and lim A„ = A U B. 

" H — *0O 

Proposition 7. 


lim A„, 

n— *°° 



(62) 

(63) 

(64) 

(65) 

( 66 ) 


Each of mu),mn),m-) , and (W,0) 

constitutes a commutative monoid (i.e. a commutative 
semigroup with a unit) and, for arbitrary A, B, C(e0*(X)) , 
we have 
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AAB = (A-B)U(B-A), 

(67) 

A©B = (A C B C ) C , 

(68) 

A • B^Ar\BcA + BczAUB<=A®B. 

(69) 


Note. 

In ordinary set theory, it is possible to define 
sixteen different kinds of binary operations. (Because 
the total space X can be divided into four regions for 
arbitrary subsets A and B , hence there exist 2 4 =16 
combinations.) Among these sixteen binary operations, 
symmetric difference AAB has a very good property from 
an algebraic viewpoint, namely, it constitutes an 
Abelian group. In probabilistic set theory, however, 
(0>(X), A) doesn't satisfy such a good property. On the 
contrary, it doesn't satisfy the associative law. 

Proposition 8. 

LetX^yer) be total spaces (possibly infinite), then 
we have 

u/w=<y r 4 (70) 

n nx y )=^{nx y ). (?i ) 

yer Ser / 


Some extended concepts of probabilistic sets 
1. Probabilistic mappings 

A mapping / from X to Y is usually defined as a 
correspondence from an element x(eX) to an element 
y(eY). There also exist' some variations such as a set 
function (a correspondence from a subset A(<=X) to an 
element y(eY) ) and a multivalued mapping (a 
correspondence from an element x(eX) to a subset 
B(<=Y) ). The concepts of set functions and multivalued 
mappings play an important role in the fields of measure 
theory and functional analysis, respectively. In the 
field of pattern recognition or learning theory, it is 
essential to consider an ambiguous correspondence (i.e. 
a probabilistic mapping) which will be defined as 
follows. 
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Definition 4. 

A probabilistic Mapping / from X to Y on a parameter 
space (fl m ,B m ,P m ) is defined by 

/ : X x {l m —* Y, 

w U) (72) 

( X , U) m )>~*f(x, (O m ). 

Some extended concepts can be defined in connection 
with probabilistic mappings, such as induced images and 
induced inverse images of probabilistic sets by a 
probabilistic mapping, and some properties are also 
investigated. However, all of them are omitted here. 

2. Moment analysis 

The parameter space (12,B,P) is a (probabilistic) 
measure space and plays an essential role in 
applications of probabilistic set theory. By using the 
measure P of this parameter space, we can carry out 
moment analysis. The possibility of a moment analysis is 
one of the most important features in probabilistic set 
theory and can not be found in other theories. 


Definition 5. 

Let A be a probabilistic set on X whose defining 
function is fi A (x,a>) . For each fixed x(eX) , mean value 
E(pa)(x), variance V(/n A )(x), standard deviation tr(/x A )(x) , 

nth moment M"(jn A )(x) , nth moment around mean value 
Mo(ii a )(x), nth absolute moment around mean value 
Mo(pa)M are defined as follows. 

E(Pa)00=J Pa(x, «) • dP(w)( = M'(h a )(x)), 

V0* A )(x) = j ( f iA(x,o>)-E( f iA)(x)) 2 dP(a,)( = M 2 0 ( f iA)(x)), (?4) 

o-(^t A )(x) = >/V(|Lt A )(x), (75) 

M n (fi A )(x) = n A (x,o)) n -dP(w) (neN), ( ?6 ) 

H"(Pa)M = | (Pa(x, u>)-E(n A )(x)) n ■ dP(w), (77) 

Mo(Pa)(x)=\ \PA(x,(o)-E(tL A )(x)\ n -dP(<o). (78) 


The justification of above stated definitions is 
ensured by Proposition 1, and the following properties 
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follow from these definitions. 

Proposition 9. 

In the situation of Definition 5, we have 

0*s£(/a A )(x)ssl for all xeX, (79) 

<><••• ^M 3 ( f t A )(x)^M 2 ( |l t A )(x)«M , (M A )(x) 

= £( MA )(x)^{M 2 (|a A )(x)}5 ‘ 

^{M 3 (fi A )(x)}’*£ • • • =£l for all xeX, (80) 

V( j a A )(x) = M 2 ( f i A )(x)-(£(M A )(x)) 2 for all xeX, (81) 

n ^ m( ^ l)->0 « My(/x A )(x) =£ M™ (/x A )(x) =£ 1 forallxeX, (82) 

lim M,"(/a A )(x) = lim M,"(/x A )(x) = 0 /or all xeX. (83) 

n— *°c n— ►<» 

Definition 6. 

Let A and B be probabilistic sets on x . For each 
fixed xeX .covariance C(/a A , n B )(x) and correlation 
coefficient r( n A ,n„){x) are defined by 


C(p.A> Mb)W = J (v-a(x,(o)-E(h a )(x)) ( 84 ) 

• (/j. 0 (x, (o) — E(fi B )(x)) ■ dP(w), 

r(f*A> Hb)(x) = C(n A , 1 u. B )(x)/y/V(fi A )(x) ■ V(fi B )(x). (85) 

(If V(fi A )(x)- V(ji B )(x) = 0, r(/x A ,ji B )(x) is not defind.) 


Proposition 10. 

In the situation of Definition 6, we have 

0*|C(m a ,/*b)(x)|<VO* a )(x)- V(fi B )(x)^l, (86) 

0«|r(M. A , m. b )(x)|=s 1, 

(87) 

C(fi A , Pa)(x) = V(p. A )(x), (88) 

C(p. A , h .„)(x) = E(p, A , M. b )(x)-E(m-a)(*) • E(fi B )(x), (ggj 

r(/x A , ju.„)(x)= ± 1 — * there cxisl real numbers a and b such that 

M- a (x, o>) = a • fi u (x,u>) + bfora.e.(oen. (90) 
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Definition 7. 

Let A„A 2) ...,A n be probabilistic sets on X whose 
defining functions are n Al (x,<o), p. Al (x, <o), . . . , n Am (x, <o) 
respectively. For arbitrary x, y(eX) , moment matrix 
M(x,y) and variable-covariance matrix V(x, y) of 
A,,A 2) ...,A„ are defined by 


M(x, y) = [m i , / ] lsiijsn where m M = p. A ,(x, <o) • p. A( ( x, w) • dP(w), (91) 

'ft 

V(x, y) = [u M ], where u M = | (fx A ,(x, a>) - E(fi Al )(x)) ( 92 j 

• (ja A( (x, u>)-E(h Ai )(x)) • dP(w). 

3. Expected cardinal number 

In ordinary set theory a notion of the cardinal number 
of a finite set is defined as the number of elements of 
the set. This concept can be extended to probabilistic 
set theory as follows. 


Definition 8. 

Let A be a probabilistic set on X whose defining 
function is /n A (x, w) . The expected support of A is 
defined as the following (ordinary) subset of X , 

supp A = |x e X | JE(fi A )(x) = J* /n A (x, <o) dP(w)>o|, (93) 

and the expected cardinal number of A, denoted by #a , 
is defined by 


#A = 


Z ([ Pa(x> w)dP(w) 

xesuppA '•'ft 

# supp A 


•f # supp A « Xo> 
if # supp A > Xo- 


(94) 


Conclusions 

The background idea of probabilistic sets was discussed 
in comparison with fuzzy sets, and its mathematical 
structure was explained without proofs. Main results are 
(l)a family of probabilistic sets constitutes a pseudo- 
Boolean algebra; (2)the possibility of moment analysis 
is a great advantage in applications (cf. [9]). 

The concepts of probabilistic sets presented in this 
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paper seems to provide a new mathematical foundation in 
the field of pattern recognition or provide a new 
mathematical foundation in the field of pattern 
recognition or decision making theory. Several studies 
are being done in these fields [9] . We will be glad if 
our idea is any help to the people concerned. 
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